MassBank / MassBank-data

Official repository of open data MassBank records
68 stars 55 forks source link

Validator error formula #247

Open ksjewell opened 8 months ago

ksjewell commented 8 months ago

Hi, I have the following validator error (example)

20:53:36.262 ERROR massbank.cli.Validator - ACCESSION: MSBNK-BAFG-CSL23102516856
20:53:36.262 ERROR massbank.cli.Validator - ^
20:53:36.262 ERROR massbank.cli.Validator - Error in 'BAFG/MSBNK-BAFG-CSL23102516856.txt'.
20:53:36.267 ERROR massbank.cli.Validator - Formula generated from InChI string in "CH$IUPAC" field does not match formula in "CH$FORMULA".

This is Brilliant blue FCF (permanent cation)

The InChI is (according to ChemDraw) correct.

InChI=1S/C37H36N2O9S3/c1-3-38(25-27-9-7-11-33(23-27)49(40,41)42)31-19-15-29(16-20-31)37(35-13-5-6-14-36(35)51(46,47)48)30-17-21-32(22-18-30)39(4-2)26-28-10-8-12-34(24-28)50(43,44)45/h5-24H,3-4,25-26H2,1-2H3,(H2-,40,41,42,43,44,45,46,47,48)/p+1

C37H37N2O9S3

Does there need to be a '+' at the end of the formula?

Best wishes Kevin

meier-rene commented 8 months ago

Hi, thanks for reporting. This issue is already known for a while. There are different libraries for chemical data in RMassBank and in MassBank-web(where the Validator comes from). They produce incompatible output. Unfortunately we havent worked out a fix for this. Indeed a "+" is needed in the formula in the following syntax [C37H37N2O9S3]+. This might look a bit strange but its not ambiguous, because it makes clear that the 3 in this case belongs to the S and not to the charge.

Only resolution is fixing the records manually atm. We can assist if needed.