coleygroup / Graph2SMILES

MIT License
52 stars 8 forks source link

Pretrained model arguments mismatch the dataset name and expected output size #8

Closed AslantheAslan closed 11 months ago

AslantheAslan commented 11 months ago

The given pretrained models (USPTO_480k_dgat.pt and USPTO_480k_dgcn.pt) in scripts/dowload_checkpoints.py have unexpected options for arguments like --data_name=MIT_mixed. The paths in the pretrained model are all wrong in my opinion because instead of paths that includes "USPTO_480k", it looks for paths that holds "MIT_mixed" during inference (in predict.sh).

As a natural result, I get the following:

Loading vocab from ./preprocessed/MIT_mixed_g2s_series_rel_smiles_smiles/vocab_smiles.txt
Traceback (most recent call last):
  File "/nfsdata/home/ismail.aslan/PycharmProjects/ncs-benchmarks/models/2Graph2SMILES/predict.py", line 169, in <module>
    main(args)
  File "/nfsdata/home/ismail.aslan/PycharmProjects/ncs-benchmarks/models/2Graph2SMILES/predict.py", line 61, in main
    vocab = load_vocab(pretrain_args.vocab_file)
  File "/nfsdata/home/ismail.aslan/PycharmProjects/ncs-benchmarks/models/2Graph2SMILES/utils/data_utils.py", line 782, in load_vocab
    with open(vocab_file, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: './preprocessed/MIT_mixed_g2s_series_rel_smiles_smiles/vocab_smiles.txt'

But in my opinion, the inference should have looked into './preprocessed/USPTO_480k_g2s_series_rel_smiles_smiles/vocab_smiles.txt'. Even when I correct this by reading the pretrained model args and rewriting those, it has another problem with the expected output shape which makes me think there is something weird.

On the other hand, when I train a model on USPTO_480k and use a checkpoint from the last step during inference (predict.sh), it works without any problem. That makes me think that there is a problem with the pretrained model arguments. Is there any chance you updated the model files recently?

The funny thing is that I was able to reproduce the same results 1-2 weeks ago. Any opinion about this issue is appreciated.

zhengkaitu commented 11 months ago

Thanks for the feedback. The branch with c_calculate has undergone a bunch of modifications so inconsistencies like these are not unexpected. We have actually just released the updated version of Graph2SMILES forward predictor, as part of ASKCOSv2. Would you mind checking out https://gitlab.com/mlpds_mit/askcosv2/forward_predictor/graph2smiles? This will be the new repo that we will be officially and continually supporting.

AslantheAslan commented 11 months ago

Thanks a lot for the support, I checked the repo that you passed to me and it looks great. Also I solved the problem I had by reading the default_vocab that was given in the old repo.