Closed maclandrol closed 1 year ago
Fix #43
from molfeat.trans.pretrained.hf_transformers import PretrainedHFTransformer
import datamol as dm
smiles = dm.freesolv()["smiles"]
transformer = PretrainedHFTransformer("ChemBERTa-77M-MLM", notation="smiles", precompute_cache=True)
output = PretrainedHFTransformer.batch_transform(transformer, smiles, batch_size=128, concatenate=False)
len(transformer.precompute_cache) # should be len(smiles).
Pretrained models should now work better with batch_transform
, allowing efficient parallelization, while retaining all cached feature. PrecomputedMolTransformer
molecule transformer should now be prefered WHEN you have existing cache already or are using static featurizers.
I have also documented missing featurizers now:
from molfeat.store import ModelStore
store = ModelStore()
_, m = store.load("mordred")
print(m.usage())
import datamol as dm
from molfeat.trans import MoleculeTransformer
smiles = dm.freesolv().iloc[:50].smiles
# sanitize and standardize your molecules if needed
transformer = MoleculeTransformer(featurizer='mordred', dtype=float)
features = transformer(smiles)
ping @cwognum
Checklist:
news
entry.news/TEMPLATE.rst
tonews/my-feature-or-branch.rst
) and edit it.xref #40 pinging @dessygil