MassBank / MassBank-web

The web server application and directly connected components for a MassBank web server
13 stars 22 forks source link

Specify (and document) the identifiers in the JSON+LD data #368

Open sneumann opened 1 year ago

sneumann commented 1 year ago

Hi, in the bioschemas records we have a number of places for all sorts of "identifiers". They need to be specified, documented and possible updated in the schema output:

@id is the URL pointing to the actual web page this metadata points to, e.g. the DataSet https://massbank.eu/MassBank/RecordDisplay?id=PB000123 Since @ids on one page need to be unique, for the MolecularEntity on that page we use the # trick e.g. https://massbank.eu/MassBank/RecordDisplay?id=MSBNK-IPB_Halle-PB000123#FTVWIRXFELQLPI-ZDUSSCGKSA-N Note that tentative molecules with R groups will not have an InChIkey, there it'd degrade to https://massbank.eu/MassBank/RecordDisplay?id=MSBNK-IPB_Halle-PB000123#

identifier for a DataSet is recommended to be "CURIEs that can be resolved using Identifiers.org" e.g. massbank:MSBNK-IPB_Halle-PB000166 via https://identifiers.org/massbank:MSBNK-IPB_Halle-PB000166.

identifier for a MolecularEntity would be the InChIkey, except if that does not exist (R-groups!)

url points to the MassBank URL for both DataSet and MolecularEntity, so it'd be identical to @id above, except it does not have to be unique, so the # trick would not be required.

This will also be of interest to the harvesters, e,g, @bhavin2897 Would that work ? Yours, Steffen