Closed qiyan98 closed 1 year ago
Hi,
I notice the number of molecules to generate for evaluation on MOSES dataset is 25000, as specified in the config file. https://github.com/cvignac/DiGress/blob/150ca149394ddbb32e855f4092b8dc1acdfce8f7/configs/experiment/moses.yaml#L17
25000
The number of molecues are also 25000 in your shared SMILES samples: https://github.com/cvignac/DiGress/blob/main/generated_samples/generated_smiles_moses.txt.
However, the original MOSES paper suggests using 30000 generated samples for evaluation. Snapshot:
30000
Source: https://arxiv.org/pdf/1811.12823.pdf#page=3
I'm new to this dataset and feel confused about the discrepancy. Can you explain why we choose 25000 instead of 30000?
Thanks, Qi
If you check the code of MOSES, I think that internally it uses 20000 valid samples to compute metrics. Since we can get enough valid molecules by sampling 25k molecules, we did not sample more.
Got it. Thanks!
Hi,
I notice the number of molecules to generate for evaluation on MOSES dataset is
25000
, as specified in the config file. https://github.com/cvignac/DiGress/blob/150ca149394ddbb32e855f4092b8dc1acdfce8f7/configs/experiment/moses.yaml#L17The number of molecues are also
25000
in your shared SMILES samples: https://github.com/cvignac/DiGress/blob/main/generated_samples/generated_smiles_moses.txt.However, the original MOSES paper suggests using
30000
generated samples for evaluation. Snapshot:Source: https://arxiv.org/pdf/1811.12823.pdf#page=3
I'm new to this dataset and feel confused about the discrepancy. Can you explain why we choose 25000 instead of 30000?
Thanks, Qi