ORCID / ORCID-Source

ORCID Open Source Project
https://orcid.org
Other
391 stars 141 forks source link

Invalid URLs are being passed to RDF Turtle output #6799

Open ebremer opened 1 year ago

ebremer commented 1 year ago

Invalid URLs are also breaking Turtle as well. In the example, "https://orcid.org/0000-0003-3039-2116", the user has a url specified as

https://dial.uclouvain.be/pr/boreal/search/site/sm_creator:\"Van de Ven, Annelies\"

this will be passed back with text/turtle as

<https://dial.uclouvain.be/pr/boreal/search/site/sm_creator:"Van de Ven, Annelies">

which is invalid and will throw an error when read in by Apache Jena even though ORCID used Jena to generate the RDF. It's not something Jean will "fix" as per:

https://issues.apache.org/jira/browse/JENA-2351 and https://github.com/apache/jena/issues/1879

Spaces and quotes are illegal in the IRI.

wjrsimpson commented 9 months ago

@TomDemeranville Any thoughts?

TomDemeranville commented 9 months ago

Hi @ebremer . Thanks for reporting this. I think I understand the problem here. However, I've read through the issues you've linked to and I'm not sure I understand the solution. What should it do?

ebremer commented 9 months ago

Minimally, only emit the URI as a string and not as a bad URI. Preferably, rewrite the URI to make it legal, but not all sites will accept a corrected version so I understand it become problematical.

ebremer commented 9 months ago

Anything that is an invalid URI could be handle like this:

"https://dial.uclouvain.be/pr/boreal/search/site/sm_creator:\"Van de Ven, Annelies\""^^xsd:anyURI

see: https://www.w3.org/TR/xmlschema11-2/#anyURI