NINAnor / nina-catalogue

NINA Catalogue - documentation: https://ninanor.github.io/nina-catalogue/
https://pycsw.nina.no
MIT License
0 stars 0 forks source link

Integration with dictionaries #20

Open frafra opened 10 months ago

frafra commented 10 months ago

@mdsnor Should we set up Skosmos and use its API in the catalog, or should handle import the dictionaries? What is the use case there? Is there any common client to query vocabularies using standard interfaces?

nicokant commented 9 months ago

Some references I found:

frafra commented 9 months ago

I will try to clear things up based on my (limited) knowledge of the topic :)

Let's start with Skosmos, which has been what it has been suggested as starting point to serve our dictionaries :)

Skosmos provides a web interface and REST OpenAPI on top of a SPARQL endpoint serving vocabularies in the SKOS data model. The web interface is nice for the users (intuitive, multilingual support), the SPARQL is great for linked data, and the REST API is great to integrate the system from other systems (like a webpage for writing metadata with search/autocomplete functionality). SKOS is better than RDF because it takes multiple languages into account, and conversion between RDF and SKOS can be made using Skosify.

Skosmos suggests to use Apache Fuseki to provide the SPARQL interface on top of RDF/SKOS vocabulary files, that are imported into a TDB database. Fuseki is suggested because it provides text indexing via jena-text, which is good for performances.

Ideally, we would like to store vocabularies in a regular Postgres database, handled by the NINA catalogue as a Django app, so that it would be easier to enforce consistency for references and make the system easier to handle, instead of having to maintain a whole new set of applications built with an entirely different group of frameworks, programming languages and technologies.

Here is a map with all the possible connections for the various components and interfaces:

graph TD;
oxigraph-->rocksdb;
sparql-->rdflib-endpoint-->rdflib;
rdflib-->BerkeleyDB;
rdflib-->memory;
rdflib-->oxigraph;
sparql-->oxigraph_server-->oxigraph;
sparql-->jena;
jena-->TDB;
jena-->SDB-->postgres;
rest-api-->skosmos-->sparql;
nina-catalogue-->rest-api;
sparql-->ontop-->postgres;
nina-catalogue-->rdflib-->rdflib-django3-->postgres;

This is my favourite architecture, that needs to be validated:

graph TD;
nina-catalogue-->rdflib-->rdflib-django3-->postgres;
skosmos-->sparql-->rdflib-entrypoint-->rdflib;
nicokant commented 9 months ago

Related: https://github.com/ontop/ontop/discussions/781

nicokant commented 9 months ago

I'll run an experiment on the proposed architecture, few notes: