Closed thegodone closed 1 week ago
Thanks for raising this issue!
I corrected the reference to the config file - running the plotting.py file should now work to reproduce the plots for the USPTO 50k results. I added a line in the .yml file to directly install the necessary dependencies (IPython). Hopefully, this should resolve the missing dependency issue. These changes can be seen in commit 784f90b.
Please let me know in case this is not working and/or if you find any other issues.
I have a question regarding the speed of the Round-trip, this is really the bottleneck of the workflow. I used both linux or mac arm env and basically it is 1-3hrs run. Is there a way to optimise this part ?
I think I may have an issue there now ?
2024-07-09 16:09:27,629 - __main__ - ERROR - MEGAN does not have a Round-trip.csv file
2024-07-09 16:09:27,629 - __main__ - ERROR - GraphRetro does not have a Round-trip.csv file
2024-07-09 16:09:27,629 - __main__ - ERROR - RetroXpert does not have a Round-trip.csv file
2024-07-09 16:09:27,629 - __main__ - ERROR - G2Retro does not have a Round-trip.csv file
2024-07-09 16:09:27,629 - __main__ - ERROR - Chemformer does not have a Round-trip.csv file
2024-07-09 16:09:27,629 - __main__ - ERROR - Graph2Smiles does not have a Round-trip.csv file
2024-07-09 16:09:27,629 - __main__ - ERROR - Retroformer does not have a Round-trip.csv file
2024-07-09 16:09:27,629 - __main__ - ERROR - GTA does not have a Round-trip.csv file
2024-07-09 16:09:27,630 - __main__ - ERROR - TiedTransformer does not have a Round-trip.csv file
2024-07-09 16:09:27,630 - __main__ - ERROR - GLN does not have a Round-trip.csv file
2024-07-09 16:09:27,630 - __main__ - ERROR - LocalRetro does not have a Round-trip.csv file
2024-07-09 16:09:27,630 - __main__ - ERROR - MHNReact does not have a Round-trip.csv file
One would however have to redo the experiments on the USPTO-50k and Pararoutes datasets for the different retrosynthesis algorithms. Nonetheless, it would be interesting to see if the results agree for different forward prediction models.
python main.py --k_back 10 --k_forward 2 --invsmiles 20 --fwd_model 'gcn' --config_name 'raw_data.json' --quick_eval False
Adding to the previous comment: Looking at LocalTransform's GitHub, it seems that inference is also done on a single reactant basis. To fix the bottleneck for computational speed, one would either have to rewrite the source code to allow for batch inference or use an existing model that can take a batch of different reactant sets during inference, e.g., 10 reactant sets per product molecule. Are you aware of any such forward prediction model? If so, I could try to integrate it within the workflow.
You could use our direct model (Table 4 Comparison of recently published methods for direct synthesis prediction on the USPTO-MIT set) from AT-Transformer paper https://www.nature.com/articles/s41467-020-19266-y. it seems not to be available in https://github.com/bigchem/synthesis I can ask igor to add it. What do you think ? I may also have a keras version of it indeed (compatible to my arm64 mps cards).
That would be great, thanks!
I may have to re-train the AT-model on the USPTO-MIT without reagent information, as retrosynthesis predictions usually do not include reagents. It is a possibility that the direct forward model's accuracy decreases without the reagent information.
@thegodone
Hi - I have just made some changes to the dev_v3.10 branch where I pushed the LocalTransform (localt_forward) model to the repo. This is yet not fully tested (and ideally I would redo the rt-accuracy metric analysis for this forward model). I will attempt to fully integrate this model within the next couple of days.
In terms of speed-up: The model runs roughly 2x faster than the gcn_forward model as it uses batch inference. Hope this helps!
Hi Friedrich,
That would be so great if I can have it before the end of the week.
Thanks a lot for your help.
Hi Guillaume,
I have now merged the pull request from the development branch. In particular, there are two major changes:
I will still have to test the rt-accuracy results for the_lctforward model on the USPTO-50k. This will be done within the coming days and results will be added to the README.md file.
In the meantime, if you find any bugs, please do let me know!
@thegodone
I will close this issue, as the initial problem was fixed. In case you find another unrelated bug, please feel free to open another one. Thanks a lot!
running the plotting.py script produce a error cause par.json is not provided.
Also jupyter installation is needed to run plotting.
Installation: on linux