what's the"massformer.ont_parser"

Taylorful commented 2 months ago

Excuse me,when I ran the scripts/compute_cfm_overlap.py，an error occurred:no massformer.ont_parser.In fact,I checked the file directory of massformer and I couldn't find it.By the way,another question,I also couldn't access to your train_all_both file directory,as well as the data/raw/metab_mol_list.txt,but it seems to be an important part to reproduce the results in your paper.Could you give me some information about these two files?Looking forward to your answer~

adamoyoung commented 2 months ago

Hi, thanks for your interest!

Good catch on the ont_parser.code_to_name, I've modified the repo to include code_to_name (just a dictionary that maps between ClassyFire codes and the corresponding English descriptions of their classes). This dictionary is used for plotting the performance of CFM and MassFormer, stratified by compound class.

I’ve also modified the repo to include the train_both configs mentioned in the README - thanks for pointing this out, it seems like I forgot to commit them to the repo. I would note that you still won’t be able to train a model using both NIST data without purchasing the dataset and exporting/preprocessing using the instructions in the repo.

The script scripts/compute_cfm_overlap.py is not currently runnable - I've just included the code to show the logic for how the CFM overlap was calculated. The file data/raw/metab_mol_list.txt is a list of the structures that the version of CFM I was comparing with was trained on, which I obtained from the CFM authors. I’m not sure if it’s available publicly, but you could reach out to them. However, I don't think it's essential for reproducing the key results of the paper, it’s just for measuring the structure overlap between the CFM training set and the NIST training set that we use for all of the other methods.

Taylorful commented 2 months ago

Thank you very much for your kind complement~

Roestlab / massformer

what's the"massformer.ont_parser" #7