How to judge the image is consistent with SMILES?

mapengsen commented 1 year ago

How to judge the image is consistent with SMILES? the molecular image to SMILES，how can i judge that they are consistent?

or how to prediction the output SMILES accurate?

Kohulan commented 1 year ago

More than 90% of the time the SMILES generated by the DECIMER Image Transformer are accurate. Moreover, you could visualize the generated structure from the predicted SMILES here: decimer.ai

OBrink commented 1 year ago

Hey @mapengsen,

I see that you have also asked this question on the img2mol repository. It is not easy to get an "accuracy" value directly from DECIMER and img2mol. If you have some sort of ground truth, you can calculate the Tanimoto distance based on a fingerprint of your choice. Otherwise, you will have to perform a visual inspection on https://decimer.ai as @Kohulan recommended.

If you need the estimate of a confidence value, you could use OSRA with the "--p" flag which will print one, but in our benchmarks, OSRA has shown to be a lot less robust than our solution. As it can be quite some struggle to compile OSRA with all of its dependencies, I would highly recommend using it with Docker. I have updated a dockerised version to the latest OSRA version here.

Good luck with your work!

iMicknl commented 1 year ago

@OBrink I see you have contributed to both repositories. I am currently looking into your great work / models and I was wondering if there is a main difference between DECIMER and img2mol? Do both support handwritten detection and do the features itself differ.

OBrink commented 1 year ago

Hey @iMicknl, thanks for your interest! Both models have the same basic structure of CNN+Sequence model. They use different convolutional neural networks for the feature extraction from the images and different sequence models for the translation of the feature vectors into SMILES representations of the molecules. Both models have not been trained on handwritten structures, the capability to read these types of images is a side-effect of both model's capability to generalise relatively well.

In previous work, the Bayer team has interpreted the compressed latent feature vector in their encoder-decoder architectures as a molecular descriptor. They showed that this 512-dimensional feature-vector which they call CDDD (continuous data-driven descriptors) can carry all the molecular information necessary to translate between SMILES and IUPAC name representations of molecules. They have done really cool work, have a look here for more information. For img2mol, they have trained a CNN encoder to generate the CDDD based on chemical structure depictions. On top of that, they use their previously published RNN decoder to generate SMILES strings based on the CDDD. The img2mol CNN has been trained on approximately 10 million images (please correct me if I am wrong, that's the number I remember).

As img2mol is based on the CDDD work, some limitations directly derive from that:

stereochemistry cannot be encoded
markush structures cannot be represented

The latest version of DECIMER uses EfficientNet V2 as a CNN and a transformer as a sequence model. @Kohulan has experienced with different RNN decoder models before, and we found that transformers outperform the RNNs by far. DECIMER has been trained on approximately 400 million data points + ~100 million markush structures. Of course, I am biased, as I am part of the group that develops DECIMER, but let me list some advantages of DECIMER:

capable of reading stereochemistry
capable of reading markush structures
DECIMER outperforms every other available tool on common benchmark sets (and every other set that we have generated) --> We are working on a publication that shows these results. The current plan is to publish the preprint before Christmas. One outstanding example: On the handwritten structure dataset, we get 70% average Tanimoto distance without that the model has seen a single handwritten structure during training.
everything about DECIMER, the source code, the models and the data generation tool (RanDepict) is open-source and published under permissive licenses
DECIMER comes with a nice web app (https://decimer.ai) that combines the OCSR tool with DECIMER Segmentation for whole-document-analysis. If you want to run it locally, you can use the source code available here to build it with docker-compose.

As I mentioned, I might be biased as I am part of the @steinbeck group at the university of Jena. Maybe we can get the opinion from someone in the @bayer-science-for-a-better-life team? :)

Have a nice day!

mapengsen commented 1 year ago

@OBrink First of all, thank you very much for your warm reply, It is very important to input a 2D image, then get his SMILES, and calculate the accuracy of this conversion process (I don't know groundtruth, because this 2D image may be generated by neural network). It will also be a very good job if the accuracy of its conversion can be calculated.

OBrink commented 1 year ago

@mapengsen Thanks for the suggestion! We will discuss what would be necessary to implement something like a confidence value, but I am afraid that this feature does not exist in the current version of DECIMER, and I cannot make any promises.

mapengsen commented 1 year ago

@mapengsen Thanks for the suggestion! We will discuss what would be necessary to implement something like a confidence value, but I am afraid that this feature does not exist in the current version of DECIMER, and I cannot make any promises.

Thank you for your reply. I'm currently doing research in this area. If you have any ideas about doing this work in the future, we can communicate and look forward to your cooperation. @OBrink

mapengsen commented 1 year ago

It is not easy to get an "accuracy" value directly from DECIMER and img2mol. but can you give me some suggestions on how to evaulate the "accuracy" value between "generated image A" and "transition SMILES B"

OBrink commented 1 year ago

Have you generated that image based on a SMILES string? Or is this something completely different? The wobbliness of the characters reminds me of some experiments that I have done with generative models for the generation of chemical structure depictions. Additionally, it is not a valid chemical structure. If you know the molecule that you depicted, then you can calculate the Tanimoto distance based on a fingerprint of your choice. If you don't know the depicted molecule, I would recommend re-depicting the resolved molecule based on the SMILES str and comparing them manually. Or you generate the SMILES based on the generated image manually.

No matter what you do, you will not get around some manual tedious work if you don't know the depicted molecules.

If it is not an endless amount of structures, I strongly recommend using our user interface on https://decimer.ai (see screenshot). If you are worried about the security of your data and you don't want to upload anything to a web app, you can also run it locally (see here how to do that)

mapengsen commented 1 year ago

Now i am doing the generated model ,and it will generated endless molecular images, i want to know the accuracy of "generated image " and "SMILES" that computed by machine automatic. i think i can computer the similarity between "generated images" and "image2SMILES2image" by some metrics(SSIM,PSNR...)

Thank you very very much ,You are very warm-hearted.

OBrink commented 1 year ago

Let me know how it works out!

Does "image2SMILES2image" mean you want to generate an image, use DECIMER to get the SMILES, redepict it and determine the similarity between the original image and the re-depicted image? What is this good for? What kind of structures does your generative model produce?

mapengsen commented 1 year ago

computer the metrics between A and A' B2A' by Rdkit

mapengsen commented 1 year ago

it is important , because you can use the accuracy Backpropagate the network to better update the Network gradient.

Kohulan / DECIMER-Image_Transformer

How to judge the image is consistent with SMILES? #31