MassBank / MassBank-web

The web server application and directly connected components for a MassBank web server
14 stars 22 forks source link

Use and Format of Tags (Syntax) #95

Closed naperone closed 6 years ago

naperone commented 6 years ago

Some records use "malformed" tags.

Examples:

  1. use of "CH$LINK INCHIKEY: value" (e.g. https://massbank.eu/MassBank/jsp/RecordDisplay.jsp?id=AC000001&dsn=AAFC)
  2. use of "PK&ANNOTATION"
  3. use of "OMMENT" (https://massbank.eu/MassBank/jsp/RecordDisplay.jsp?id=ET201653&dsn=Eawag_Addn)

Specific

For all the examples the actual intend of the authors seems obvious (for (1) "CH$LINK: INCHIKEY value, for (2) "PK$ANNOTATION", for (3) "COMMENT"), therefore I would suggest to change these obvious mistakes in the record files.

General

The MassBankRecord Format does not explicitly state the format of tags. I would suggest to agree on a format and to add it to the specification (ideally with a regular expression). I would suggest the following format:

^[A-Z]+(_[A-Z]+)*(\$[A-Z]+(_[A-Z]+)*)?$

Each Tag consists of an optional prefix and a mandatory suffix. If there are prefix and suffix they are separated by "$". Suffix and can be made up of multiple capital letter words separated by "_".

legal examples:

X (single word suffix) XX (single word suffix) X_X (two word suffix) X_X_X (three word suffix) X$X (one word prefix, one word suffix) X_X$X (two word prefix, one word suffix) X$X_X (one word prefix, two word suffix) X_X$X_X (two word prefix, two word suffix) ...

uchem-massbank commented 6 years ago

Where does point 2 occur? Which records?

naperone commented 6 years ago

https://massbank.eu/MassBank/jsp/RecordDisplay.jsp?id=WA002740&dsn=Waters https://massbank.eu/MassBank/jsp/RecordDisplay.jsp?id=WA002741&dsn=Waters https://massbank.eu/MassBank/jsp/RecordDisplay.jsp?id=WA002743&dsn=Waters https://massbank.eu/MassBank/jsp/RecordDisplay.jsp?id=WA002744&dsn=Waters

Treutler commented 6 years ago

Fixed during the validation of records. We do not intend to introduce a schema for the syntax of tags because we do not see striking advantages.