OBrink / DECIMER.ai

This repository contains the code for https://decimer.ai
MIT License
35 stars 8 forks source link

Certain molecule representation create the wrong result #28

Closed professorCKone closed 2 years ago

professorCKone commented 2 years ago

testmole The numbers in the molecule are part of representation for NMR purposes. Number "5" is interpreted as "S" (sulfur). I guess this might have to do with the resolution of the png file.

steinbeck commented 2 years ago

Nice and very difficult example. In most cases, atom numbers are not placed directly on the atom position and then DECIMER does a good job, but this particular case will always be hard. Thanks for submitting. We'll see how future versions of DECIMER deal with this. As a first step, I suggest we create a high-res version of this and see how it copes.

professorCKone commented 2 years ago

Thank you for the quick feedback. I know that Decimer does a great job with regular structures. Let me create a high res version. This might do the trick.

Von: Christoph Steinbeck @.> Datum: Montag, 23. Mai 2022 um 12:32 An: OBrink/DECIMER_Web @.> Cc: Christian Kronseder @.>, Author @.> Betreff: Re: [OBrink/DECIMER_Web] Certain molecule representation create the wrong result (Issue #28)

Nice and very difficult example. In most cases, atom numbers are not placed directly on the atom position and then DECIMER does a good job, but this particular case will always be hard. Thanks for submitting. We'll see how future versions of DECIMER deal with this. As a first step, I suggest we create a high-res version of this and see how it copes.

— Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FOBrink%2FDECIMER_Web%2Fissues%2F28%23issuecomment-1134501762&data=05%7C01%7Cchristian.kronseder%40fhnw.ch%7C4bc9dc4bf5c94c19092c08da3ca78eea%7C9d1a5fc8321e4101ae63530730711ac2%7C0%7C0%7C637888987605900752%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ImI8vs8NhRK2bkKiCemYLD7yFBSGq%2F%2BSf5kQiAFayZw%3D&reserved=0, or unsubscribehttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FASZVVFYM24L4HYPXL3YCAODVLNNEFANCNFSM5WVLEGSQ&data=05%7C01%7Cchristian.kronseder%40fhnw.ch%7C4bc9dc4bf5c94c19092c08da3ca78eea%7C9d1a5fc8321e4101ae63530730711ac2%7C0%7C0%7C637888987605900752%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=BlFdAs9SWBWWQP7eWKm9lQRvCmNV7izWaQKYgERzJYU%3D&reserved=0. You are receiving this because you authored the thread.Message ID: @.***>

Kohulan commented 2 years ago

@professorCKone Could you kindly send us a High-resolution version of this image?

professorCKone commented 2 years ago

MolView (structural formula)(1)

professorCKone commented 2 years ago

I don't have a hi-res version of the originla, which is exactly my problem. I tried to make NMR annotated gifs machine reabable, but had to give up due to the low resolution of the available images. It seems that numbers in a hi-res version don't bother your deep learning approach. The result is correct, but in order to be less guessing and more precise you need to test a few more I suppose. RDKit allows you to create molecules with numbers btw. Rgds Christian

Kohulan commented 2 years ago

@professorCKone

Thanks a lot for this overall report, As you mentioned I could see that with the higher resolution Image decimer.ai works perfectly well.

image

The problem here I could see is that in the original image the number "5" is too similar to the letter "S". We did implement molecules with atom numbers depicted within probably we should increase the augmentations on such numbers as well.

professorCKone commented 2 years ago

Other complications in lo-res are 6, 9 and 8, which can be read as O. I had several variations of this problem. Still well done from your side with decimer. We will have a closer look and see if we can integrate it in our electronic lab journal Best regards, Christian

steinbeck commented 2 years ago

Christian, if you have an interesting data set of that type to OCSR, then there is of course always the possibility of retraining DECIMER with fabricated noisy, low-res images of the same type. Annotated nmr data are close to our heart (if this is what this is :)) and we could try to work together on this. If you want to take this off github, feel free to send me an email.

Cheers, Chris

— Prof. Dr. Christoph Steinbeck Analytical Chemistry - Cheminformatics and Chemometrics Friedrich-Schiller-University Jena, Germany Phone Secretariat: +49-3641-948171 http://cheminf.uni-jena.de http://orcid.org/0000-0001-6966-0814

What is man but that lofty spirit - that sense of enterprise. ... Kirk, "I, Mudd," stardate 4513.3..

On 23. May 2022, at 15:00, Christian Kronseder @.***> wrote:

Other complications in lo-res are 6, 9 and 8, which can be read as O. I had several variations of this problem. Still well done from your side with decimer. We will have a closer look and see if we can integrate it in our electronic lab journal Best regards, Christian

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.

professorCKone commented 2 years ago

Hi Christoph, I have about 13'000 of those noisy pictures. The goal is to automatically interpret NMR spectra by using deep learning.

How about a quick video call and I can explain what I am working at?

Best Christian

Von: Christoph Steinbeck @.> Datum: Montag, 23. Mai 2022 um 17:01 An: OBrink/DECIMER_Web @.> Cc: Christian Kronseder @.>, Mention @.> Betreff: Re: [OBrink/DECIMER_Web] Certain molecule representation create the wrong result (Issue #28) Christian, if you have an interesting data set of that type to OCSR, then there is of course always the possibility of retraining DECIMER with fabricated noisy, low-res images of the same type. Annotated nmr data are close to our heart (if this is what this is :)) and we could try to work together on this. If you want to take this off github, feel free to send me an email.

Cheers, Chris

— Prof. Dr. Christoph Steinbeck Analytical Chemistry - Cheminformatics and Chemometrics Friedrich-Schiller-University Jena, Germany Phone Secretariat: +49-3641-948171 http://cheminf.uni-jena.de http://orcid.org/0000-0001-6966-0814

What is man but that lofty spirit - that sense of enterprise. ... Kirk, "I, Mudd," stardate 4513.3..

On 23. May 2022, at 15:00, Christian Kronseder @.***> wrote:

Other complications in lo-res are 6, 9 and 8, which can be read as O. I had several variations of this problem. Still well done from your side with decimer. We will have a closer look and see if we can integrate it in our electronic lab journal Best regards, Christian

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.

— Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FOBrink%2FDECIMER_Web%2Fissues%2F28%23issuecomment-1134788275&data=05%7C01%7Cchristian.kronseder%40fhnw.ch%7Caed04566fb914c1878a908da3ccd232d%7C9d1a5fc8321e4101ae63530730711ac2%7C0%7C0%7C637889149001146653%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=sI%2F0MvoCGyQnoCUkgUmMDSKFBGbMXV0pvp3p2zduRWI%3D&reserved=0, or unsubscribehttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FASZVVFYH35URKWVLNXQHZODVLOMUHANCNFSM5WVLEGSQ&data=05%7C01%7Cchristian.kronseder%40fhnw.ch%7Caed04566fb914c1878a908da3ccd232d%7C9d1a5fc8321e4101ae63530730711ac2%7C0%7C0%7C637889149001146653%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2FvqXpoqS%2BbitTXhD4fND4RfyZ39UJSkAnYPDkUZKJvw%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.***>

OBrink commented 2 years ago

I am going to close this issue here for now, as it technically is a problem of the OCSR engine, and not a problem of the web application. We are continuously working on the further diversification of our training data in order to increase DECIMER's capabilities in future versions.