Closed wouterbeek closed 7 years ago
Serd parses prefixed IRIs that contain illegal Unicode characters in their local name.
For example, the following Turtle snippet appears in an actual data file (notice that the underscores are the illegal Unicode character EN DASH (U+2013):
EN DASH (U+2013)
@prefix dbp: <http://dbpedia.org/property/> . @prefix dbr: <http://dbpedia.org/resource/> . dbr:Germany_at_the_2006–08_European_Nations_Cup dbp:stadium dbr:Amsterdam .
Serdi parses this snippet, but it should raise an error:
serdi unicode.ttl <http://dbpedia.org/resource/Germany_at_the_2006\u201308_European_Nations_Cup> <http://dbpedia.org/property/stadium> <http://dbpedia.org/resource/Amsterdam> .
Tested with Serd 0.28.0.
+1 for a distinction between strict and lax mode.
BTW I do not see the rational behind not allowing the Unicode dash in this position. What could be the rational behind this in the standard?
Fixed in https://github.com/drobilla/serd/commit/1cd321825c52eddd4175cb4ec58ae8d7ad2da48d
Serd parses prefixed IRIs that contain illegal Unicode characters in their local name.
For example, the following Turtle snippet appears in an actual data file (notice that the underscores are the illegal Unicode character
EN DASH (U+2013)
:Serdi parses this snippet, but it should raise an error:
Tested with Serd 0.28.0.