IBM / molformer

Repository for MolFormer
Apache License 2.0
244 stars 42 forks source link

Finetuning on a custom dataset with the Huggingface MoLFormer model? #22

Open Khrystofor19 opened 1 month ago

Khrystofor19 commented 1 month ago

Hi! I have been trying to use MoLFormer model from Huggingface for cancer drug response prediction model, but it seemingly struggles in comparison with MegaMolBART and ChemBERTa. Is there a way to finetune the huggingface model on a custom SMILES dataset? I was thinking that relevant molecules (oncology drugs and related scaffolds) might have been underrepresented in the MoLFormer training data, causing underperformance. Could you, please, help me to figure out the workflow for finetuning the Huggingface MoLFormer instance on a custom SMILES dataset?