Closed srosset81 closed 11 months ago
Finally, allowing to register ontologies dynamically will be needed because, for ActivityPods 2.0, we want containers to be automatically created with the prefix of the ontology. So instead of having hard-coded core ontologies, we will create a new OntologiesService
that we will be available by any service to register ontologies. They will be persisted on the settings
dataset. It will be possible to pass a list of ontologies to this service to register them on start. Here are the actions that will be available:
register(prefix, url, owl, jsonldContext, overwrite=false)
Register a new ontology. On start, this action will be called for all core ontologies. If overwrite
is false, the action will return an error if an ontology with this prefix/URL already exist, otherwise it will overwrite it.findPrefix(url)
Returns the prefix, based on prefix.cc API (see https://github.com/assemblee-virtuelle/activitypods/issues/128). If not found on prefix.cc, returns nothing.list()
Returns an array of registered ontologies (cached)get(prefix)
Return a single ontology based on the prefix, nothing if no ontology match (cached)getRdfPrefixes()
Returns the list of ontologies to be used in SPARQL queries (cached)getJsonLdContext()
Returns the JSON-LD context. Put together the jsonldContext
of ontologies, if they are available, or otherwise just add the prefix of the ontology. (cached)The jsonldContext
on the register function can be an URL or an object/array. In the case of an object/array, they will be JSON-stringified on persistance. The register function should fail if the JSON-LD context is in conflict with existing contexts.
On ActivityPods, ontologies that are registered dynamically by external appliactions will use the prefix from prefix.cc and not pass any OWL file or JSON-LD context, as we can do without that.
Use a library like ldo to avoid being dependant of the context passed.
I like that idea - it has the positive side effect of adding typing support which reduces bugs and improves developer experience.
Warning: some context like https://www.w3.org/ns/activitystreams.jsonld include multiple ontologies. It is up to the developer to ensure there is no conflict between them (maybe we could do a check on startup)
I'm not sure if I understand that correctly. Json-ld would go with overriding less recent definitions, if there are multiple, unless a @protected
keyword is set (https://www.w3.org/TR/json-ld/#protected-term-definitions). So putting the ActivityStreams or ActivityPods context last should be fine with regard to framing AS-properties. Is that what you mean?
Maybe I don't quite understand the use case of the issue yet. So the idea is that if no JsonLdContext
header is passed upon an ldp request, the service method getJsonLdContext()
is called to generate the context? Would that require an endpoint to be generated for each iteration of the json context (e.g. https://mypod.store/ontologies/context-XYZ.jsonld
)?
And could we add a default context value for a specific resource or container which is used if no JsonLdContext
header is passed? E.g. this would be convenient for AP collections and objects.
I'm not sure if I understand that correctly. Json-ld would go with overriding less recent definitions, if there are multiple, unless a
@protected
keyword is set (https://www.w3.org/TR/json-ld/#protected-term-definitions). So putting the ActivityStreams or ActivityPods context last should be fine with regard to framing AS-properties. Is that what you mean?
I've been writing tests for that today, and indeed it seems validation fails only when the @protected
keyword is used. But I'm pretty sure I came accross other kind of conflicts when compacting JSON-LD data, I need to dig this deeper.
Maybe I don't quite understand the use case of the issue yet. So the idea is that if no
JsonLdContext
header is passed upon an ldp request, the service methodgetJsonLdContext()
is called to generate the context?
Yes exactly ! In ActivityPods, this will replace the https://activitypods.org/context.json
context, since this is not scalable.
Would that require an endpoint to be generated for each iteration of the json context (e.g.
https://mypod.store/ontologies/context-XYZ.jsonld
)?
It could be interesting to provide such an endpoint (mostly for frontend apps). Not sure what path to use though. This makes me realize that every Pod should, in theory, have its own JSON-LD context, since it depends on the applications that were installed... But that's not how I went with the implementation so far (the ontologies are saved on the general settings
dataset, not on the Pod). This will requires some thoughts :thinking:
And could we add a default context value for a specific resource or container which is used if no
JsonLdContext
header is passed? E.g. this would be convenient for AP collections and objects.
ActivityStreams will necessarily be in the core ontologies, so its context will always be included. We will use an array of contexts instead of putting everything together like we do now in the ActivityPods context file. Something like this:
"@context": [
"https://www.w3.org/ns/activitystreams",
{
"ldp": "http://www.w3.org/ns/ldp#",
...
}
]
Thanks for the remarks!
For a moment I was thinking if we could get around creating custom contexts.
And if it was a good idea to have each resource have some kind of ex:defaultJsonContext
value which would be set from the @context
field when a resource is created with a POST
with content-type ld+json
.
But this value would be unset for example if the resource was created with content type turtle and brings us back to the question of which context to use..
[...] every Pod should, in theory, have its own JSON-LD context, since it depends on the applications that were installed... But that's not how I went with the implementation so far (the ontologies are saved on the general
settings
dataset, not on the Pod). This will requires some thoughts 🤔
I see several options for storing these informations:
settings
dataset, and keep the urn:
-type link. They will only be accessible through SPARQL with a webId system
, but that seems OK as it is something internal.pim:PreferencesFile
as this is a Solid standard. But I haven't yet found a description of how informations should be stored in these files.In the last option, we should avoid persisting core ontologies, and use instead the array passed to the LdpOntologiesService. This could be a good idea for other options as well, so that we don't need to store (and maintain/migrate) triples that will be replicated in all datasets.
I see several options for storing these informations
From what you describe, options 2 and 3 seem to be most convincing to me, since they appear to be more "transparent" about what's happening from the outside and are a bit more generalizable / closer to the specs..
I think I'm mixing too many problems. Application-defined ontologies are really needed at the moment only for the LDP containers path generation, and we don't need to have something perfect because this is not standard and we don't know if we will keep this in the long run.
The choice of the prefix is really an internal implementation matter that has little impact on the functionning of the Pod. Other implementations could use LOV or custom prefixes databases (the general philosophy is that the containers path is not a problem, and we don't really care about it). However what we need is consistency, so that, if two applications use the same ontology, the same prefix will be used for their containers. That's why we need persistence, but it doesn't matter if this is all persisted in the same dataset (the settings dataset).
What we also want is clean contexts which explicitely include the ActivityStreams context. If we put all the properties directly on the context, it will add a big ugly header and increase the response size. So a solution could be to put all these custom context properties in an pod-provider-level context file (accessible via GET), like we do on other SemApps instances with the /context.json file, except it will be dynamically generated.
I'm also in the process of splitting the ontologies
service with a new jsonld.context
service, so the result will be a bit different that the above proposal.
Current usage of default JSON-LD context
Currently the default JSON-LD context (passed to LDP and ActivityPub services) is used to format (or more precisely "frame") the rough results returned by Jena Fuseki, whenever no
JsonLdContext
header is passed.Developer convenience
Having propretly formatted JSON-LD is a convenience for developers, when they browse through a LDP container.
Instead of having full URI, they can see prefixes (this also applies to Turtle format).
Instead of having
@id
for every URI, there is something more readable.If unformatted JSON-LD was returned, browser extensions like Header Editor, combined with the new
JsonLdContext
header, could however help developers see proper formatting.Moleculer services
It is also useful to pass formatted data between other Moleculer services. Moleculer services can use the
jsonContext
parameter of theldp.resource.get
action if they want to get the results framed according to the context they want. But for Moleculer events (likeldp.resource.created
), we are dependent on what the LDP service emitted.If we used rough Fuseki results, there would be some consistency also. Or better yet: expanded results, so that we are not dependant on the formatting of a particular triple store.
More generally, in all Moleculer services, we should not treat data as JSON but as RDF, and find a library to properly process data, no matter the context used.
ActivityPub federation
That's the real problem: Most ActivityPub-compatible servers treat data as JSON and don't reformat it. They generally tolerate the addition of other contexts (this is considered as the proper way to create extensions), but if you pass rough JSON-LD data, they will most likely not reframe it.
This is a problem not only for activities sent between federated servers, but also potentially for resources ("objects" in ActivityPub vocabulary) that are retrieved from the LDP server. In the ActivityPub spec, it is indicated that "Implementers SHOULD include the ActivityPub context in their object definitions. Implementers MAY include additional context as appropriate.".
One solution could be to include the ActivityStreams context in the default JSON-LD context (especially when the ActivityPub service is activated) and to ignore other contexts. Or to provide a context which fits with core ontologies (like LDP), and ignore app-specific contexts.
Proposed solution
JsonLdContext
header to get the format they need.jsonldContext
field to the ontologies definition