Pr/dessygil/40 - Githubissues

datamol-io / molfeat

molfeat - the hub for all your molecular featurizers

https://molfeat.datamol.io

Apache License 2.0

169 stars 16 forks source link

Pr/dessygil/40 #47

Closed maclandrol closed 1 year ago

maclandrol commented 1 year ago

Checklist:

[x] Was this PR discussed in a issue? It is recommended to first discuss a new feature and let the community know whether you are planning or have started working on it before opening a PR.
[x] Add tests to cover the fixed bug(s) or the new introduced feature(s) (if appropriate).
[x] Update the API documentation if a new function is added or an existing one is deleted.
[x] Added a news entry.
- copy news/TEMPLATE.rst to news/my-feature-or-branch.rst) and edit it.

xref #40 pinging @dessygil

maclandrol commented 1 year ago

Fix #43

from molfeat.trans.pretrained.hf_transformers import PretrainedHFTransformer
import datamol as dm

smiles = dm.freesolv()["smiles"]
transformer = PretrainedHFTransformer("ChemBERTa-77M-MLM", notation="smiles", precompute_cache=True)
output = PretrainedHFTransformer.batch_transform(transformer, smiles, batch_size=128, concatenate=False)
len(transformer.precompute_cache) # should be len(smiles).

Pretrained models should now work better with batch_transform, allowing efficient parallelization, while retaining all cached feature. PrecomputedMolTransformer molecule transformer should now be prefered WHEN you have existing cache already or are using static featurizers.

maclandrol commented 1 year ago

I have also documented missing featurizers now:

from molfeat.store import ModelStore
store = ModelStore()
_, m = store.load("mordred")
print(m.usage())

import datamol as dm
from molfeat.trans import MoleculeTransformer
smiles = dm.freesolv().iloc[:50].smiles
# sanitize and standardize your molecules if needed
transformer = MoleculeTransformer(featurizer='mordred', dtype=float)
features = transformer(smiles)

ping @cwognum