Open rdmpage opened 2 years ago
Further examples, for 0000-0003-2861-949X we have DOIs that are broken, e.g.:
Note the |
in the middle. These DOIs break any attempt to parse JSON-LD from Orcid.org
That example has sadly been added by a member, and we see this behaviour from several of our clients. We do normalise many of our identifiers in API3.0, but don't do this for everything. This one has probably got past our parser because it has two dois in it. Argh.
Further to the list of woes with ORCID JSON-LD, note that sameAs
should be a list of one or more URIs, but ORCID often includes simple strings such as numbers. These are not valid RDF.
Note that it may be slightly confusing because of the way JSON-LD is output because sameAs
appears as a list of strings (e.g., "http://some.url"). But it is a list of URIs, not strings. If you look at the context at https://schema.org/docs/jsonldcontext.json you will see sameAs
defined as:
"sameAs": {
"@id": "schema:sameAs",
"@type": "@id"
},
This may seem a small point, but it breaks any use of sameAs
in SPARQL queries because properly constructed queries expect values of sameAs
to be URI not a literal.
It would be great if ORCID were to actually use the RDF it exports ("dog-fooding"), because if it did it would rapidly discover that its RDF output has problems. This is a pity because this is potentially a fabulous resource.
There are cases where ORCID URLs and Handles are not valid URIs, which breaks attempts to parse JSON-LD as RDF. These happen in about 10-20 records in a sample of 5000 that I am working with. Not supper common, but enough to break things.
URLs sometimes lack the
http
prefix, e.g the personal page for https://orcid.org/0000-0003-1802-2649. This breaks RDF, but also the ORCID web page: The personal page for Andrey I. Khalaim is given ashttps://orcid.org/www.zin.ru/labs/insects/hymenopt/personalia/khalaim/
instead ofhttps://www.zin.ru/labs/insects/hymenopt/personalia/khalaim/
Ideally a simple regular expression to check users have actually input a URL would catch these.
For Handles there are some very bad examples at https://orcid.org/0000-0003-2573-1371 such as:
Note that first Handle is
http://hdl.handle.net/cecchetti,%20arianna.%20%22effects%20of%20tourism%20operations%20on%20the%20bahavioural%20patterns%20of%20dolphin%20populations%20off%20the%20azores%20with%20particular%20emphasis%20on%20the%20common%20dolphin%20(delphinus%20delphis)%22.%202018.%20112%20p..%20(disserta%C3%A7%C3%A3o%20de%20mestrado%20em%20biologia).%20ponta%20delgada:%20u
This is probably a trivial error in the user-supplied content, but ideally this would be caught on input. I realise that dealing with user-supplied content can be a bit of a nightmare.