acoli-repo / conll-rdf

Advanced graph rewriting and LLOD publication for CoNLL and other TSV formats
25 stars 9 forks source link

RDF 1.1 Turtle prefixes in CoNLLRDFUpdater #80

Open chiarcos opened 2 years ago

chiarcos commented 2 years ago

Background

There are two ways of declaring namespace prefixes in RDF 1.1 Turtle: (a) @prefix bla: <...> . (as in RDF 1.0, dot at the end!) (b) PREFIX bla: <...> (introduced in RDF 1.1, no dot at the end!)

At the moment, CoNLLRDFUpdater seems to support (a) only. This is not a problem if it only processes data produced by CoNLLStreamExtractor or Apache Jena, but it is if it is processing CoNLL-RDF data produced by other converters.

Action

Support syntax (b) in CoNLLRDFUpdater. Note that the same difference in syntax also applies to @base and BASE, so these need to be updated, as well. (tbc: Are these currently included in prefix preprocessing.)

Preliminary workaround

Until this is solved, it is possible to convert all input data to nt notation before processing it, e.g., using rapper:

 $> cat my-file.ttl | rapper -i turtle '#' | run.sh CoNLLRDFUpdater ...

However, note that rapper will emit triples only, without preserving comments or spaces, so CoNLLRDFUpdater will not split the input. Also note that the simple RDF 1.1 to RDF 1.0 conversion that rapper provides with -o turtle will not work with CoNLLRDFUpdater because it will insert empty lines between groups of triples with the same subject.

chiarcos commented 2 years ago

Note: As we have a workaround, this has low priority. Until it is solved, however, we have to state in the help dialog of CoNLLRDFUpdater that our RDF text streams require RDF 1.0 Turtle-style prefixes.

chiarcos commented 2 years ago

To be fixed by pull #40.