datamol-io / molfeat

molfeat - the hub for all your molecular featurizers
https://molfeat.datamol.io
Apache License 2.0
169 stars 16 forks source link

Pr/dessygil/40 #47

Closed maclandrol closed 1 year ago

maclandrol commented 1 year ago

Checklist:


xref #40 pinging @dessygil

maclandrol commented 1 year ago

Fix #43

from molfeat.trans.pretrained.hf_transformers import PretrainedHFTransformer
import datamol as dm

smiles = dm.freesolv()["smiles"]
transformer = PretrainedHFTransformer("ChemBERTa-77M-MLM", notation="smiles", precompute_cache=True)
output = PretrainedHFTransformer.batch_transform(transformer, smiles, batch_size=128, concatenate=False)
len(transformer.precompute_cache) # should be len(smiles). 

Pretrained models should now work better with batch_transform, allowing efficient parallelization, while retaining all cached feature. PrecomputedMolTransformer molecule transformer should now be prefered WHEN you have existing cache already or are using static featurizers.

maclandrol commented 1 year ago

I have also documented missing featurizers now:

from molfeat.store import ModelStore
store = ModelStore()
_, m = store.load("mordred")
print(m.usage())
import datamol as dm
from molfeat.trans import MoleculeTransformer
smiles = dm.freesolv().iloc[:50].smiles
# sanitize and standardize your molecules if needed
transformer = MoleculeTransformer(featurizer='mordred', dtype=float)
features = transformer(smiles)

ping @cwognum