aiidalab / aiidalab-widgets-base

Reusable widgets for AiiDAlab applications
MIT License
7 stars 17 forks source link

SmilesWidget: Handle invalid SMILES gracefully #270

Closed danielhollas closed 1 year ago

danielhollas commented 2 years ago

For example, this input: CC(C)(C)*OO generates this stack trace.

KeyError                                  Traceback (most recent call last)
~/apps/aiidalab-widgets-base/aiidalab_widgets_base/structures.py in _on_button_pressed(self, change)
    728             self.SPINNER
    729         )  # font-size:20em;
--> 730         self.structure = self._mol_from_smiles(self.smiles.value)
    731         self.output.value = ""
    732 

~/apps/aiidalab-widgets-base/aiidalab_widgets_base/structures.py in _mol_from_smiles(self, smiles, steps)
    715         """Convert SMILES to ase structure try rdkit then pybel"""
    716         try:
--> 717             return self._rdkit_opt(smiles, steps)
    718         except ValueError:
    719             return self._pybel_opt(smiles, steps)

~/apps/aiidalab-widgets-base/aiidalab_widgets_base/structures.py in _rdkit_opt(self, smiles, steps)
    710         natoms = mol.GetNumAtoms()
    711         species = [mol.GetAtomWithIdx(j).GetSymbol() for j in range(natoms)]
--> 712         return self._make_ase(species, positions, smiles)
    713 
    714     def _mol_from_smiles(self, smiles, steps=10000):

~/apps/aiidalab-widgets-base/aiidalab_widgets_base/structures.py in _make_ase(self, species, positions, smiles)
    663         # Get the principal axes and realign the molecule along z-axis.
    664         positions = PCA(n_components=3).fit_transform(positions)
--> 665         atoms = Atoms(species, positions=positions, pbc=True)
    666         atoms.cell = np.ptp(atoms.positions, axis=0) + 10
    667         atoms.center()

/opt/conda/lib/python3.8/site-packages/ase/atoms.py in __init__(self, symbols, positions, numbers, tags, momenta, masses, magmoms, charges, scaled_positions, cell, pbc, celldisp, constraint, calculator, info, velocities)
    206                     'Use only one of "symbols" and "numbers".')
    207             else:
--> 208                 self.new_array('numbers', symbols2numbers(symbols), int)
    209 
    210         if cell is None:

/opt/conda/lib/python3.8/site-packages/ase/symbols.py in symbols2numbers(symbols)
     18     for s in symbols:
     19         if isinstance(s, str):
---> 20             numbers.append(atomic_numbers[s])
     21         else:
     22             numbers.append(int(s))

KeyError: '*'

Happy to take this on, but if anybody want to fix this go ahead. :-)

yakutovicha commented 2 years ago

@danielhollas any help would be appreciated :)

csadorf commented 2 years ago

@danielhollas Thanks a lot for taking this on, please let us know if you need any help with this.

danielhollas commented 1 year ago

This is now fixed.

yakutovicha commented 1 year ago

When testing the SMILES widget I discovered that the error is back. @danielhollas can you please have a look?

danielhollas commented 1 year ago

When testing the SMILES widget I discovered that the error is back

Which SMILES specifically triggered the error? Is it the exact same stack trace?

danielhollas commented 1 year ago

Indeed, the original SMILES in this issue still throws an error, though it is a somewhat of a special case, which might be considered a bug in RDKit rather than here. Most invalid SMILES should be handled.

@yakutovicha did you see this with other smiles?

yakutovicha commented 1 year ago

I tried some examples from the link. They all worked except for N#N wich is gracefully handled:

n_components=3 must be between 0 and min(n_samples, n_features)=2 with svd_solver='full'

I also introduced some random errors in SMILES, and things seem to work there. But I cannot guarantee I tested all possible issues.

which might be considered a bug in RDKit rather than here

I still believe we need to handle that.

danielhollas commented 1 year ago

They all worked except for N#N wich is gracefully handled:

This is for sure a bug, could you open a new issue for this? It's not really about SMILES parsing, but the postprocessing code that orients the molecule does not handle 2-atom molecules.

danielhollas commented 1 year ago

I still believe we need to handle that.

Sure, but I do not think it blocks the 2.0 release.

danielhollas commented 1 year ago

The problem with N#N has been fixed in #477