Open miquelduranfrigola opened 2 months ago
Hi @miquelduranfrigola
Did you work on this for the workshop in Ghana? If not, should we?
I worked on this partially and I solved it to make it work for the workshop. I did not close the issue because we need to test it with every REINVENT model to be 100% sure. What priority should we give to it?
I would do it in the next Chem Sampler sprint, I am marking it with the tags
Summary
Some (or all of) the REINVENT models in the Ersilia Model Hub have an unconventional output in JSON format, mainly because there is an
outcome
header in theservice.py
file. We need to give the output in tabular format and fill in the missing gaps withNone
.Also importantly, some of the returned SMILES are labelled for some reason. We want to get rid of this labeling plus, ideally, we want to standardise the smiles and return a unique set (perhaps ordered by tanimoto similarity).
In summary, we need to work a little bit more on these models to have a more standard output.
Objective(s)
A more standard output (tabular format) for the REINVENT models.
Documentation
Here is how we can remove atom labels and standardise using RDKit and the standardiser library: