drobilla / serd

A lightweight C library for RDF syntax
https://gitlab.com/drobilla/serd
ISC License
86 stars 15 forks source link

How to apply a base URI? #20

Closed wouterbeek closed 5 years ago

wouterbeek commented 5 years ago

I'm trying to add support for a base URI flag in HDT (https://github.com/rdfhdt/hdt-cpp/issues/131).

Firstly, I have a SerdEnv to which I can set and get a base URI:

const SerdNode* envBase = serd_env_get_base_uri(env, nullptr);
SerdURI base_uri;

Secondly, I have a SerdNode* term which is a relative IRI, but I cannot obtain the corresponding absolute IRI:

SerdNode base {serd_node_new_uri_from_string(envBase->buf, nullptr, &base_uri)};
SerdNode iri {serd_node_new_uri_from_string(term->buf, &base_uri, nullptr)};

When I print iri.buf it is still the same as term->buf. I must be doing something wrong...

drobilla commented 5 years ago

The second parameter to serd_env_get_base gets you the SerdURI (parserd URI used for resolution) for resolving against, no need to copy it, for example:

#include <serd/serd.h>

int
main()
{
    SerdNode base = serd_node_from_string(
            SERD_URI, (const uint8_t*)"http://example.org/");
    SerdEnv* env = serd_env_new(&base);

    SerdURI         base_uri;
    const SerdNode* env_base = serd_env_get_base_uri(env, &base_uri);

    SerdNode term = serd_node_from_string(SERD_URI, (const uint8_t*)"/foo/bar");
    SerdNode iri  = serd_node_new_uri_from_string(term.buf, &base_uri, NULL);

    fprintf(stderr, "%s\n", iri.buf);

    return 0;
}

(If you already have a node, though, you can use serd_node_new_uri_from_node instead)

Sorry, this API is a bit confusing. It's an optimization for streaming to keep the amount of URI parsing absolutely minimal, but isn't worth the confusion in the public API. I've made things simpler in the next major version which should hopefully be out soon.

wouterbeek commented 5 years ago

Thanks for the code snippet, I was doing things in an unnecessarily complex way; the following indeed suffices:

SerdURI baseUri;
const SerdNode* envBase = serd_env_get_base_uri(env, &baseUri);
SerdNode absolute {serd_node_new_uri_from_node(relative, &baseUri, nullptr)};

I get incorrect results in some cases, e.g., base URI https://a.org and relative URI b resolves to https://a.orgb i.o. https://a.org/b. Am I still using the Serd API incorrectly, or could this be a bug in the library?

drobilla commented 5 years ago

Possibly a bug for the special case of an empty path (assuming that's what is actually supposed to happen according to the standard, though it must be). I suppose nobody noticed until now because it's bad practice to use such URIS as bases and there's no examples in the spec.

wouterbeek commented 5 years ago

Nice, my code seems to work now! Closing this issue...

I've created a separate issue for resolution WRT base URIs with an empty path (https://github.com/drobilla/serd/issues/21).