Chemellia / ChemistryFeaturization.jl

Interface package for featurizing atomic structures
https://chemistryfeaturization.chemellia.org/dev/
MIT License
41 stars 14 forks source link

Flesh out WeaveFeaturization #128

Open DhairyaLGandhi opened 2 years ago

DhairyaLGandhi commented 2 years ago

This is based on the weve featurization from the weave implementation in deepchem, but not exactly a direct port.

I'm not sure I love the way we map from species feature names to the MolecularGraph functions here, seems like that would make it very manual to "just" use some feature provided by MolecularGraph.

There are still a couple of TODOs, one of them being fleshing out both atom_features as well as pair_features, but before I implement them, I wanted to ask whether we would be fine with something like a FeaturizedMol (similar to FeaturizedAtom, it can hold featurized atom, bond and pair info), or is there a different method I should look into?

rkurchin commented 2 years ago

The result should still be a FeaturizedAtoms object, it would just have different type parameters and the content of the encoded_features field would be more complicated.

codecov[bot] commented 2 years ago

Codecov Report

Merging #128 (af601ea) into main (0c8f2b7) will decrease coverage by 3.31%. The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #128      +/-   ##
==========================================
- Coverage   80.73%   77.42%   -3.32%     
==========================================
  Files          20       20              
  Lines         571      598      +27     
==========================================
+ Hits          461      463       +2     
- Misses        110      135      +25     
Impacted Files Coverage Δ
src/ChemistryFeaturization.jl 80.00% <ø> (ø)
src/features/speciesfeature.jl 76.19% <0.00%> (ø)
src/featurizations/weavefeaturization.jl 0.00% <0.00%> (ø)
src/utils/orbitalfeature_utils.jl 97.43% <0.00%> (+0.02%) :arrow_up:
src/atoms/atomgraph.jl 76.25% <0.00%> (+0.30%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 0c8f2b7...af601ea. Read the comment docs.

DhairyaLGandhi commented 2 years ago

If we have missing (and in general Union eltypes) in the FeaturizedAtoms we might lose out on the BLAS calls and be slower if we fall back to generic methods, which I'd love to avoid. We might not be, but we might have to benchmark to know how that looks.

DhairyaLGandhi commented 2 years ago

Where are the missings used ultimately?

rkurchin commented 2 years ago

The missings just indicate that that bond feature isn't defined for that combination of atoms (because they're not bonded to each other).

rkurchin commented 2 years ago

JFYI, this should eventually be part of WeaveModel under the new organization, so if you want to make it into a PR over there now feel free, or if you'd rather wait until the reorg is actually registered, that's fine too.