MolecularAI / aizynthfinder

A tool for retrosynthetic planning
https://molecularai.github.io/aizynthfinder/
MIT License
562 stars 128 forks source link

A question regarding cleaning training data with rxnutils package #125

Closed yangxfei closed 1 year ago

yangxfei commented 1 year ago

Hello, @SGenheden , nice to meet you again;-) @SGenheden , two weeks ago, you mentioned that aizynthfinder uses rxnutil package to prepare training modeling data. I check the rxnutil package and find that rxnutil has two step pipelines. My question is : why we remove atom mapping in the preparation_pipeline and then add atom mapping with rxnmapper to prepare the modeling data?

can we just remove atom mapping in the preparation_pipeline and don't add atom mapping with rxnmapper for the modelling training data? What's the purpose of remove atom mapping in the preparation_pipeline step and the purpose of adding atom mapping with rxnmapper in the second step? do we have to do adding atom mapping with rxnmapper in the second step?

Thanks again !

Philip Yang

The first step is to use preparation_pipeline to clean the trainning data. In the clean_pipeline.yml file, it contains a step named "remove_atom_mapping". The pipeline command in this step is like below: conda activate rxn-env python -m rxnutils.data.uspto.preparation_pipeline run --nbatches 200 --max-workers 8 --max-num-splits 200

The second step is to add atom mapping with rxnmapper package based on the output of the first step. conda activate rxnmapper python -m rxnutils.data.mapping_pipeline run --data-prefix uspto --nbatches 200 --max-workers 8 --max-num-splits 200

SGenheden commented 1 year ago

Hello. First, in the future, please post issues regarding rxnutils in that Github page. The USPTO data comes with atom-mapping that is of sub-par quality and therefore it is removed. It is added with rxnmapper because atom-mapping is necessary for template extraction.

yangxfei commented 1 year ago

thanks for your answer. now it 's clear for me. @SGenheden , i will post my question there in the future. ;-)