Open vikramsubramanian opened 4 months ago
Summary: Issue with resolving relative IRIs against a base IRI in RDF Turtle files.
The issue described pertains to the incorrect resolution of relative IRIs in RDF Turtle files. The expected behavior is that relative IRIs such as <hashtaggreen-goblin>
and <hashtagspiderman>
should be resolved against a base IRI, but they are currently being ingested incorrectly without resolution.
To address this issue, the following solution should be implemented:
read_IRIREF
function in third_party/serd/src/n3.c
correctly handles relative IRIs by resolving them against the current base IRI.read_IRIREF
function to check if the IRI is relative. If it is, resolve it using the base IRI before adding it to the node.serd_uri_resolve
function from third_party/serd/src/uri.c
to resolve the relative IRI against the base IRI.RdfReader::prefixHandle
function in src/processor/operator/persistent/reader/rdf/rdf_reader.cpp
to store the base IRI when the BASE
directive is encountered in the RDF Turtle file.read_IRIREF
function to resolve relative IRIs correctly.Here is a pseudo-code outline of the changes to be made in the read_IRIREF
function:
static SerdStatus
read_IRIREF(SerdReader* const reader, Ref* const dest) {
// ... existing code ...
// Check if the IRI is relative
if (is_relative_iri(reader, *dest)) {
// Resolve the relative IRI against the base IRI
SerdURI resolved_uri;
serd_uri_resolve(reader->base_iri, *dest, &resolved_uri);
// Replace *dest with the resolved URI
*dest = push_node_with_resolved_uri(reader, &resolved_uri);
}
// ... existing code ...
}
Additionally, ensure that the RdfReader::prefixHandle
function correctly updates the base IRI when the BASE
directive is encountered:
SerdStatus RdfReader::prefixHandle(void* handle, const SerdNode* /*name*/, const SerdNode* uri) {
auto reader = reinterpret_cast<RdfReader*>(handle);
if (is_base_directive(uri)) {
// Update the base IRI
reader->base_iri = reinterpret_cast<const char*>(uri->buf);
}
return SERD_SUCCESS;
}
Make sure to test the changes with RDF Turtle files containing relative IRIs to verify that they are now being resolved correctly against the base IRI.
The 'read_base' function is responsible for handling the 'BASE' directive which is relevant to resolving relative IRIs against a base IRI.
The 'serd_uri_serialise_relative' function serializes relative URIs, which is directly related to the issue of RDF Base IRIs not working correctly.
The 'serd_writer_set_base_uri' function sets the base URI in the writer, which could be involved in the resolution of relative IRIs.
The 'serd_node_new_relative_uri' function creates a new node for a relative URI, which is relevant to the issue of handling relative IRIs.
According to the [RDF Turtle file specification]( some IRIs can be specified as relative IRIs against a base IRI. The [specification of relative IRIs]( is as follows:
So for example, if you ingest the triples in the example in the specification:
You would get for, " We currently ingest these as "hashtaggreen-goblin".
)