Open TheMessik opened 2 weeks ago
Hi @TheMessik,
Blank nodes from data from a parser will be large random numbers. So I'm assuming you are controlling the RDF production and setting the blank node label yourself.
The RDFWriter
builder doesn't currently provide a way to set the NodeFormatter
. It would be good to add this.
If you want to read such data in, and preserve the label (with care!), then use RDFParser.create().labelToNode(labelToNode)
with LabelToNode.createUseLabelAsGiven()
. Your code is responsible for blank node label uniqueness and the rules about what happens on graph merge and reading files multiple times.
For writing: NodeFormatter
is the interface for controlling the RDF term output.
In extending RDFWriterBuilder
, interfaces WriterGraphRIOT
and WriterDatasetGraphRIOT
, the low level per-format interfaces, will need changing.
There several kinds of writer for the N-Triples/Turtle family of syntax - streamed, flat, batching and collecting - all use a NodeFormatter
.
At the RDFWriter
level, there isn't the "writer profile" abstraction like there is when reading (where there is a node maker FactoryRDF
carried by ParserProfile
).
N-Quads is the simplest output form. It is streamed and uses WriterStreamRDFPlain
.
Below is the code that is used for N-Quads. You could use that, modified at NodeFmtLib.encodeBNodeLabel
to just use the label. Be careful - some characters aren't legal in a blank node label string.
public static void main() {
String input = "_:x <x:p> <x:o> .";
Graph graph = RDFParser.fromString(input, Lang.NT).toGraph();
AWriter out = IO.wrapUTF8(System.out);
NodeFormatter fmt = new NodeFormatterNT() {
@Override
public void formatBNode(AWriter w, String label) {
w.print("_:");
String lab = NodeFmtLib.encodeBNodeLabel(label);
w.print(lab);
}
};
StreamRDF stream = new WriterStreamRDFPlain(out, fmt) ;
StreamRDFOps.graphToStream(graph, stream);
}
Hope that helps
Version
4.10.0
Feature
When serializing a
DatasetGraph
into NQ format, I find that all blank nodes with specified labels get a "B" prepended to the label, e.g. a blank node with a label "students" would be serialized as "_:Bstudents". This is somewhat annoying for my use case: an RML engine needs to follow a particular spec, including filling in blank node patterns.My workaround currently consists of Regex replacing, but this is far from ideal.
I'd like to suggest a more granular control of how the NQ writer (and all writers in general) handle Blank nodes: give the user an option to preserve the original blank node without prepending a "B" in front of the label.
Code example that performs the serialization:
Are you interested in contributing a solution yourself?
Perhaps?