QizhiPei / BioT5

BioT5 (EMNLP 2023) and BioT5+ (ACL 2024 Findings)
https://arxiv.org/abs/2310.07276
MIT License
91 stars 5 forks source link

About the metrics of FCD and Text2Mol #9

Closed Lyu6PosHao closed 4 months ago

Lyu6PosHao commented 4 months ago

Thanks for your great work!

I want to know the details about calculate FCD and Text2Mol metrics. It seems that the related codes are not provided in the repo.

Actually, I have already gotten the repositories of FCD and Text2Mol. But I don't know the details about how to use them to reproduce the results in BioT5 paper.

For example, when calculating FCD, do only valid molecules participate in calculations?

I would be grateful if codes or some details could be provided!

QizhiPei commented 4 months ago

Thanks for your interest in our work. To reproduce the FCD and Text2Mol score of BioT5, you can

  1. Refer to MolT5 evaluation scripts(https://github.com/blender-nlp/MolT5/tree/main/evaluation).
  2. Follow the Evaluation Section(https://github.com/QizhiPei/BioT5?tab=readme-ov-file#evaluation) in README to evaluation on the CheBI-20 dataset. The output file will be saved https://github.com/QizhiPei/BioT5/blob/b83ee54453e006cfda6f684bdc16585588e71d3e/biot5/metrics/save_only_metrics/save_only_metrics.py#L67, which can serve as input of MolT5 evaluation code.

For step2, you may need to download the CheBI-20 dataset in instruction format in advance (https://huggingface.co/datasets/QizhiPei/BioT5_finetune_dataset).

As BioT5 use SELFIES to represent molecule, all the generated SELFIESs are valid molecules.

Lyu6PosHao commented 4 months ago

Thanks, I will have a try.