MassBank / MassBank-web

The web server application and directly connected components for a MassBank web server
14 stars 22 forks source link

Validator restrictions for SMART like Smiles #291

Closed tsufz closed 3 years ago

tsufz commented 3 years ago

@meier-rene, I think the restrictions are to strict. I would like to add SMART like SMILES to some records (which are processed by DEPICT without problems). However, if I add SMILES, but not InChI, the validator complains:

09:50:51.302 ERROR massbank.cli.RefreshDatabase - Error reading/validating record "/MassBank-data/HBM4EU/HB002869.txt". 09:50:51.304 ERROR massbank.cli.Validator - CH$SMILES is available but CH$IUPAC is empty.

The example: CH$SMILES: C()C1=NN=C(C(=O)N1)C2=C()C()=C()C=C2 *=[OH (n=1) & H (n=3)] CH$IUPAC: N/A

We need this function soon, I try to switch of the rule in meanwhile for checking.

Best, Tobias

schymane commented 3 years ago

We should probably not have "non-standard" SMILES in the CH$SMILES field - we had some alternative tag suggestions in previous conversations, I thought?

tsufz commented 3 years ago

@meier-rene will find a solution for this and check if SMARTS notation parses in CDK and so on. We may use a specific SMARTS tag which triggers the override of all smiles versus InChI versus InChIKey versus Formula checks in the record validator and parses then the SMART into CH$SMILES field. Checking if SMILES is SMART might be the smartest solution if for example a routine in CDK is available. Any ideas?

schymane commented 3 years ago

it's not SMARTS exactly, but rather extended SMILES that you are meaning, correct? Some of the extended SMILES will actually parse to be a generic representation in depict, but may also be able to produce an InChI in special cases (e.g. "representative structure")

meier-rene commented 3 years ago

This feature is now implemented. Its possible to use SMILES with wildcards (*) to annotate tentative structures as described in http://opensmiles.org/opensmiles.html#inatoms.

Also the "title" as shown in https://www.simolecule.com/cdkdepict/depict.html is supported and allowed and can be used as additional annotation. (e.g. C(*)1C(*)N(C(=N[N+](=O)[O-])N1)C(*)C2=C(*)N=C(C(*)=C(*)2)Cl *=[OH (n=1) & H (n=5)])

@schymane you are right. these are still SMILES, but atoms can be defined as wildcard *. In my code the usage of SMILES with wildcards is incompatible with the use of InChI. The validator can not use the chemical information in the SMILES because they are incomplete. But the webpage can render some nice pictures.

meier-rene commented 3 years ago

@tsufz Please try your new contribution with release 2.1.8

meier-rene commented 3 years ago

Done.