binghong-ml / retro_star

Retro*: Learning Retrosynthetic Planning with Neural Guided A* Search
MIT License
131 stars 30 forks source link

Questions about the dataset used for training MLP and planning routes #11

Closed junsu-kim97 closed 3 years ago

junsu-kim97 commented 3 years ago

I am thankful for this interesting work and the shared code!

I have questions about the dataset for training MLP and the planning routes in Retro*.

1) The train/val/test routes are extracted from USPTO-full train/val/test, which is at GLN github repo? (https://github.com/Hanjun-Dai/GLN)

2) The train/val/test dataset for training MLP is same as the train/val/test dataset for building routes?

In summary, I want to ask you that whether "data split at GLN == data split for building routes == data split for training MLP" is right.

Thanks in advance.

Junsu Kim

binghong-ml commented 3 years ago

Hi Junsu, thanks for the interest in our work.

Retro* assumes a well-trained one-step model. It can be simple models like MLP or more advanced ones like GLN.

In our experiment, to simulate real-world scenarios, the training split for one-step MLP is the same as the training split for routes. To answer your question in short, "data split at GLN != data split for routes == data split for training MLP".

Binghong

binghong-ml commented 3 years ago

Please refer to http://binghongchen.net/pdf/ICML-retrosyn-slide.pdf page 14 for illustration on dataset construction. Thanks!

junsu-kim97 commented 3 years ago

Thanks for the answers and helpful slides!