Closed atef199 closed 1 year ago
I even tried it on hand drawn structures and none of the predictions were correct
Hello @BinaryBlaze291,
Thank you for the report!
About the first three depictions - We currently do not include explicit hydrogen atoms in the training data (unless they are essential for the stereochemistry), as the depictions of large molecules tend to get very messy with overlapping elements. This leads to the tool's inability to interpret them. I agree that it should not have problems with this kind of simple structure. I will push a fix to RanDepict so that a proportion of the molecules with < 15 heavy atoms is depicted with explicit hydrogen atoms in the future. This should fix the issue in the future.
About the hand-drawn structures - this is more complicated and hard to say. The ability to read chemical structures at all is rather a side effect that came with the diversification of the training data. We'll have a look at it. The background could potentially be a problem here. You could try binarising the images before processing them with DECIMER Image Transformer.
Kind regards, Otto
Here's an example of a binarised hand-drawn depiction vs the original image:
Here is another one.
Hi @BinaryBlaze291 ,
We are currently investigating this issue. In the next version update, this issue will be resolved. Please be aware that DECIMER was not trained on any hand-drawn images, and this could result in a failure on poorly drawn depictions of chemical structures.
Kind regards, Kohulan
@BinaryBlaze291
We have added explicit hydrogen atoms to the depictions in our training data. The next version of DECIMER Image Transformer will be capable of handling structures that contain them. Thank you for bringing this to our attention!
The new version of the model that @Kohulan has released today is capable of handling the hand-drawn structures you posted @BinaryBlaze291
@BinaryBlaze291, we start creating a new dataset now that contains depictions with explicit hydrogen atoms. Additionally, the model will be trained to return cxSMILES with atom coordinates.
I will close this issue for now, as it will probably take a little while until the new model is ready to be published. Again, thank you for bringing these issues to our attention! The lack of recognition capabilities for explicit hydrogen atoms is a big blind spot of the current model version that will be fixed soon.
Issue Type
Performance
Source
GitHub (source)
DECIMER Image Transformer Version
2.2.0
OS Platform and Distribution
No response
Python version
No response
Current Behaviour?
The model cannot recognize pretty simple structures. I tried all of these and it gives wrong results
Which images caused the issue? (This is mandatory for images related issues)
No response
Standalone code to reproduce the issue
Relevant log output
No response
Code of Conduct