NatLibFi / bib-rdf-pipeline

Scripts and configuration for converting MARC bibliographic records into RDF
Creative Commons Zero v1.0 Universal
29 stars 5 forks source link

Fails on bad URIs with Jena >3.1.1 #69

Open osma opened 6 years ago

osma commented 6 years ago

As demonstrated by the latest Travis build, newer Jena versions are stricter with URI parsing and thus the riot command used for converting from marc2bibframe2 output (RDF/XML) to N-Triples fails.

This is the same bad URI problem that was already discussed on the Jena users' list in October 2016, just more severe since Jena is stricter nowadays. The solution implemented back then (filter-bad-uris.py) comes too late in the pipeline.

I think the only viable solution is to catch bad URIs (or e.g. bad language tags in MARC records that will become bad URIs) before the BIBFRAME conversion step, preferably using Catmandu Fix scripts.

For now I will revert to Jena 3.1.1 because it still works.