MolecularAI / aizynthfinder

A tool for retrosynthetic planning
https://molecularai.github.io/aizynthfinder/
MIT License
571 stars 130 forks source link

How to train own data set? #80

Closed sitanshubhunia closed 1 year ago

sitanshubhunia commented 2 years ago

I have installed and run this tool with jupyter notebook. How can I train own data set(SMILES)? Although I have go through documentation [ https://molecularai.github.io/aizynthfinder/training.html ], but don't understand how to do this. If possible please give an example with some SMILES which is not in stock file.

Thanks in advance.

SGenheden commented 2 years ago

Hello, Thanks for your feedback. What dataset are you trying to train new model on? We published a few datasets with the PaRoutes project that can serve as examples. You find them here: https://zenodo.org/record/6275421#.Yv9MXnZBwuU Have a look at the csv files, hope this helps

sitanshubhunia commented 2 years ago

@SGenheden Thanks for your reply.

Firstly with my limited knowledge I have successfully run aizynthfinder with download_public_data. Thanks for your documentation.

I have few questions

  1. We have a large number of reactions store in database in mrv/cml format. I wants to create stock file with this Keras model and the list of unique template extraction.

  2. From PaRoutes project, how I used this data it in marked section i.e. in Stock (marked in black) and Expansion Policy image

  3. Requesting you to give some documentation on Advance Tab image

  4. After search, how each Routes store in SMILES format, For example I wants below mention search routes in SMILES format each molecule image

  5. What is I am doing wrong, that image is not showing in marked section

image


### 6. Any java implementation of aizynthfinder is available ?

Sorry for too many annoying questions.

SGenheden commented 1 year ago
  1. We have a large number of reactions store in database in mrv/cml format. I wants to create stock file with this Keras model and the list of unique template extraction.

I am not really following. Do you want to make a stock, it is explained here: https://molecularai.github.io/aizynthfinder/stocks.html and if you want to create an expansion model it is explained here: https://molecularai.github.io/aizynthfinder/training.html. These are two different tasks

  1. From PaRoutes project, how I used this data it in marked section i.e. in Stock (marked in black) and Expansion Policy You need to specify the PaRoutes files (stock and model) in your configuration file and they will be selectable from the GUI

  2. Requesting you to give some documentation on Advance Tab

_Documentation on these settings is here: https://molecularai.github.io/aizynthfinder/configuration.html. _

  1. After search, how each Routes store in SMILES format, For example I wants below mention search routes in SMILES format each molecule.

    You need to write a few lines of python code to extract the SMILES. From the top of my head, something like this should work to write out all the reaction SMILES of the first route

for reaction in app.finder.routes.reaction_trees[0].reactions():
    print(reaction.reaction_smiles())
  1. What is I am doing wrong, that image is not showing in marked section Never seen this before. Please report this a separate bug and provide enough details to reproduce the issue

  2. Any java implementation of aizynthfinder is available ? No. Why?

sitanshubhunia commented 1 year ago

@SGenheden Thanks for your help

  1. For example I have a reaction smiles i.e CC(O)=O.CCO>CCO>CCOC(C)=O, I wants to if you want to create an expansion model as explained in https://molecularai.github.io/aizynthfinder/training.html. Is it possible how can I do this with example. I accepted that due to my lack of chemistry knowledge, am unable to do this. Also don't understand why this training is required and how this impact in Retrosynthesis later. Frankly speaking, in our organization, one of our Research Scientist told me about "aizynthfinder" and he requested me to implement this. Being a technical guy, I am trying to implement this.

  2. How to create stock files from smiles.txt, I understand and successfully loaded it accordingly . Again thanks for it. image

  3. Thanks

  4. Manny many thanks for those few lines of code

  5. I will create a separate bug with details

  6. As I need to make this user friendly. I am friendly with java, although it's not a barrier. Basically I wants to create a user friendly web application which will be the part of our own ELN(Electronic Note Book) .

Thanks in advance

SGenheden commented 1 year ago

Regarding training: you need first to atom-map your reaction. You can for instance use the rxnmapper project for this. Then you need to decide if you want to add your own reactions on top of the existing USPTO reactions or if you want to create a completely new model.

The advantage of training of on your own data is that you could obtain recommendations specially tailored for your chemistry. But you would probably need a substantial amount of reactions to train on in order for it to have an effect. Training on a few reactions in addition to USPTO wouldn't make a difference.

sitanshubhunia commented 1 year ago

@SGenheden Thanks for your guidance. I will try as you advise. I also closed this issue. 🙏