MassBank / MassBank-web

The web server application and directly connected components for a MassBank web server
13 stars 22 forks source link

Bioschemas JSON-LD not properly encoded/escaped #316

Closed sneumann closed 1 year ago

sneumann commented 2 years ago

Hi, we got a report from @AlasdairGray that:

In preparation for the BioHackathon next week, we have been harvesting data from as many sites as possible. Whilst harvesting the pages from MassBank, we found 10,326 pages with invalid JSON-LD on them. From the page that I inspected, this was due to the use of quotation marks within a text field with the quotation mark not being properly encoded. For example, you can see the error at the following link to the syntax validator

A fix probably requires proper encoding of strings in

Yours, Steffen

sneumann commented 2 years ago

A light-weight choice could be but note comment on slashes (we have both URLs and InChIs containing slashes ...)

tsufz commented 2 years ago

Ah, good to know, I checked the crawlers, they do also complain:

Parsing error: Missing ',' or '}

Example: image

tsufz commented 2 years ago

The second error is a bad escape sequence in the SMILES string:


sneumann commented 1 year ago

With the proper serialisation this can now be closed. Thanks Rene, yours, Steffen