Closed sgrellet closed 3 years ago
Adding metadata-only endpoints for registers and a metadata endpoint for the registry itself is feasible and may be a generally useful extension.
Such endpoints could deliver the usual RDF formats, including RDF/XML.
If the ROR specification allows ROR consumers to ignore properties they are not interested in, then once there are metadata-only endpoints you would just need to add the metadata corresponding to ROR on top of the the normal registry metadata.
If the specification (which isn't precise on this point) requires "these properties and only these properties" then that would be a different situation and would require a ROR specification solution that seems less appropriate for the main code line.
The validation process used (https://ies-svn.jrc.ec.europa.eu/projects/inspire-registry/wiki/Registry_federation_xsl_validators) excludes several prefixes. Testing on our files it seems ok also having additionnal properties
Testing the example files with INSPIRE xsl, the .ror for the register (litho.ror) initially did not pass register descriptor validators for several reasons listed below
the validation script (http://inspire-regadmin.jrc.ec.europa.eu/register-federation/validators/Register_descriptor_validator.xsl)
ldregistry download takes a first 'information seed' and pulls out the data graph from it. This is clean from websem perspective but does not match the validation script expectations -> short-term : describe the SPARQL construct needed to enable the additionnal metadata elements for INSPIRE registry federation (or have them added for each register in the registry), pass the output TTL through rdf-translator API to have a flat XML structure and do some manual edits for bullet points 3 and 4 above to validate properly -> mid-term : allow to have a download structure conforming to the INSPIRE registry federation expectatations (validation + harvest)
Comments and clarification questions ...
So there's several parts to this problem - the need to include the full publisher information and isPartOf
links to the registry, the constrained XML serialization and the need for additional terms such as skos:definition
.
The publisher and registry link information can't be included in a register definition without modifications to the codebase. You can uploaded embedded information by using blank nodes but the ROR validator doesn't accept those and indeed the ROR spec explicitly asks for URIs as well as inline descriptions of them. It would be possible to have separate registers (e.g. under structure
) to hold the publisher descriptions and the overall registry description and then automatically pull those into the register payload when responding to a ROR request.
After some experimentation I think the XML serialization issue itself is solvable. The validators have a very narrow view of RDF/XML but we can configure the Jena RDF/XML writer to pull skos:Concept
and skos:ConceptScheme
to the top level. With that then a sample register, with manually added publisher metadata and registry metadata, passes the validator.
The ROR specification includes a separate media type for ROR files so the right approach is probably to add a _format=ror
option which would both trigger the enrichment of the return payload (to pull in the publisher and registry links) and then set the media type to application/x-ror-rdf+xml
. A new marshaller could then handle that media type by setting the right parameters for the RDF/XML writer.
The ROR requirements over use of skos:prefLabel
instead of rdfs:label
for registers, the need for a skos:definition
of a register etc are solvable by simply including those in your register definitions, no change required to support those.
Some questions:
The ROR specification limits cardinality of dct:title
, skos:prefLabel
etc to 1..1
. This means that you can not have multilingual labels on a ROR registry, register or individual concepts. However, the validators do not check this and payloads with multilingual labels do pass validation. Can we rely on this? Can we assume that whatever software consumes this can cope with multilingual labels?
Is your requirement to federate all registers (or at least all under /def
) or do you want selective control so that you explicitly enumerate the registers to be included in the federation? The controlled option would be easier to implement since we would then just need a system register to represent the overall catalog to be federated. That would also be the place to include the top level registry metadata, which would be convenient. However, you would then need to manually add each new register to be federated into the catalog register.
Thank you very much Dave, the solution of a separate registers (e.g. under structure) seems the most suitable way I think. We agree with your proposition of a selective control using this separate register. I think it's easier for the dev and better for us since we want to have an individual control on what registers should be pushed through ror. It's even better this way since we can have different properties (e.g. producers and different periodicities) from one register to antother.
Concerning your question about dct:title
, skos:prefLabel
, it's seems more logical to us that the software consuming the ror payload should handle multilingual labels. We'll check with jrc people why the cardinality is limited to 1..1.
we also agree with the _format=ror
and application/x-ror-rdf+xml
media type solutions.
JRC reply: Currently the RoR is just using the English language as first choice. If this is not found, it tries to read the first occurrence without specifying the language.
@der have there been any advances about this issue. I see that the dedicated dev branch hasn't moved since December, so I assumed not. I tried it thought with the content negotiation as explained in the wiki preview , it didn't work. I think I missed something in our exchanges. Do you have an idea when this will be finished please? Thanks a lot
@afeliachi I am currently working on this issue now that the internationalisation feature appears to be in a good state. I expect that it will be ready to test by the end of this week.
Thanks Simon. We would really appreciate that.
The branches for this feature on registry-core
and registry-config-base
are ready to test (5-ror-format
).
They are documented here.
I had to differ from some of the plans discussed above in order to make sure that the produced XML would conform to the given templates (the default serialisation tends to produce valid RDF but not in the required structure). In particular, the registry descriptor and the root ConceptScheme of the register descriptor will be rendered with only the properties that are relevant to the INSPIRE format. This is currently a hard coded list, so please let me know if you need this to be configurable or if there are specific properties you would like to include.
Have you made any progress in testing this?
Sorry Simon I thought I responded to this. Actually there are few issues with the payload returned:
The content doesn't meet the conformance scheme of having rdf:description markups instead of typed markups (skos:ConceptScheme & skos:Concept).
Registry descriptor: when requesting the ror description of the root ( the registry) the response doesn't provide registry descriptor as specified by the conformance class.
When loading data for the first time inScheme properties are lost (looks like an old bug and not for this version only)
thank you
@afeliachi OK, I will try to look into this in the next week or so.
@afeliachi
rdf:description
s, and the XSL validators succeed even if they are not - could you point me to which part of the spec requires this?/structure
and signified by having the dcat:Catalog
type. The suggested setup is documented here.Hi Simon Am currently retesting this. We'll be back to you ASAP sorry for the delay
Hi Simon
regarding the rdf:description
pattern you can see it in the examples provided in the requirement, but you are right the XSL validator accepts type markups also, so no problem with that, sorry.
I was aware of the configuration recommendation of where to locate the register with the dcat:Catalog
type. But we hoped that the payload would expose the registry (root) URI as the URI of the catalog
so instead of having
<dcat:Catalog rdf:about="http://localhost:8080/ldregistry/structure/catalog">
It is necessary to have
<dcat:Catalog rdf:about="http://localhost:8080/ldregistry/">
So I tried to edit the root directly, by adding the dcat:Catalog
type and the necessary conformance class properties, but the problem is that I am not able of editing the root registry description. I encounter the following message
Proposed notation for item is not a legal pchar or starts with '_' -
I get the same response with (PATCH request) to update the root description. Do you have any suggestion on that.
Thanks @afeliachi , I will look into point 2 this week.
Thank you @simonoakesepimorphics
I've updated the branch to satisfy the requirement that the registry descriptor should have the root register as its root element rather than the /structure/catalog
register.
The registry description and dataset catalog resides on the /structure/catalog
register as before, and you can access it from either the root register (ROR format) or the catalog register itself. However, the result will always use the root register as the root of the graph and transpose the registry description from /structure/catalog
onto it. This means that you will not need to modify the root register in any way.
Register descriptors should work the same as before.
I've updated the wiki page to reflect this.
Hi @simonoakesepimorphics Just finished the tests. Everything looks perfect. Thank you for this last edit. @sgrellet I close the ticket
Dynamically generate .ror files as specified by INSPIRE group on register federation:
More details and example files here: https://ies-svn.jrc.ec.europa.eu/projects/inspire-registry/wiki/Registry_federation_requirements
Running examples (hand-made for BRGM registry)
Idea would be to have the required 'descriptors' natively included in the description of the registry and the registers -> no more need for static files but need to have a new 'download format' (like RDF/XML.ROR) that triggers the required response/serialization