Kohulan / DECIMER-Image_Transformer

DECIMER: Deep Learning for Chemical Image Recognition using Efficient-Net V2 + Transformer
MIT License
197 stars 51 forks source link

Molecules output #88

Closed path-to-freedom closed 8 months ago

path-to-freedom commented 9 months ago

Issue Type

Questions

Source

GitHub (source)

DECIMER Image Transformer Version

latest

OS Platform and Distribution

windows

Python version

3.11

Current Behaviour?

Hello, thank you for your model, I am doing research work and I would like to know how to generate only valid SMILES strings using your model, could you provide this code, if possible, I will be very grateful for your answer!

Which images caused the issue? (This is mandatory for images related issues)

No response

Standalone code to reproduce the issue

-

Relevant log output

No response

Code of Conduct

path-to-freedom commented 8 months ago

I forgot to say, I mean inference mode

Kohulan commented 8 months ago

The SMILES predictions exhibit a nearly 100% validity; however, to ensure a thorough validation check, we lack a direct application. The reason lies in the potential for users to edit SMILES if they are deemed invalid. To address your concern, you can employ the following approach using RDKit, Hope this helps

from rdkit import Chem
from DECIMER import predict_SMILES

# Chemical depiction to SMILES translation
image_path = "path/to/imagefile"
predicted_smiles = predict_SMILES(image_path)

def check_SMILES(predicted_smiles):
    mol = Chem.MolFromSmiles(predicted_smiles)
    if mol:
        return predicted_smiles
    else:
        return "Prediction Invalid"

# Example usage:
result = check_SMILES(predicted_smiles)
print(result)
path-to-freedom commented 8 months ago

The SMILES predictions exhibit a nearly 100% validity; however, to ensure a thorough validation check, we lack a direct application. The reason lies in the potential for users to edit SMILES if they are deemed invalid. To address your concern, you can employ the following approach using RDKit, Hope this helps

from rdkit import Chem
from DECIMER import predict_SMILES

# Chemical depiction to SMILES translation
image_path = "path/to/imagefile"
predicted_smiles = predict_SMILES(image_path)

def check_SMILES(predicted_smiles):
    mol = Chem.MolFromSmiles(predicted_smiles)
    if mol:
        return predicted_smiles
    else:
        return "Prediction Invalid"

# Example usage:
result = check_SMILES(predicted_smiles)
print(result)

Thank you very much for your answer, I appreciate it, but my interest is a little different, and there is no way to get 100% generation of valid molecules using a probability distribution and select the desired option, or by searching through suitable tokens to get the correct SMILES string?

Kohulan commented 8 months ago

One could incorporate a beam-search function for prediction to retrieve valid SMILES. However, in the case of DECIMER, we chose not to integrate this feature due to its computationally intensive nature, as it would significantly slow down the entire prediction process. If you wish to implement it independently, please feel free to do so. I am happy to take a pull request.

path-to-freedom commented 8 months ago

One could incorporate a beam-search function for prediction to retrieve valid SMILES. However, in the case of DECIMER, we chose not to integrate this feature due to its computationally intensive nature, as it would significantly slow down the entire prediction process. If you wish to implement it independently, please feel free to do so. I am happy to take a pull request.

okay, thank you for your answer!