OSU-NLP-Group / LLM4Chem

Official code repo for the paper "LlaSMol: Advancing Large Language Models for Chemistry with a Large-Scale, Comprehensive, High-Quality Instruction Tuning Dataset"
https://osu-nlp-group.github.io/LLM4Chem/
MIT License
62 stars 5 forks source link

Model usage for Property Prediction #6

Open kunaldahiya opened 1 month ago

kunaldahiya commented 1 month ago

Hi,

Thanks for releasing the code. I was trying out the trained model for Property Prediction - Clintox. I used the following snippet (as suggested).

from generation import LlaSMolGeneration

generator = LlaSMolGeneration('osunlp/LlaSMol-Mistral-7B')
generator.generate('Is <SMILES> COC[C@@H](NC(C)=O)C(=O)NCC1=CC=CC=C1 </SMILES> toxic?')

It runs; however, it always responds with "No" for all the molecule in "data/raw/test/property_prediction-clintox.jsonl". Same issue was observed when I tried on some other molecules. Can you please look into this? Please let me know if I need to change something while running.

Thanks!

btyu commented 1 week ago

Thank you for bringing this to our attention. Since most compounds in the training data are not toxic, the model may have inevitably learned the bias and tend to respond "No". Larger-scale and more balanced training data would alleviate this problem. Thanks again for your interest.