Informasjonsforvaltning / fdk-public-service-harvester

fdk-public-service-harvester will harvest catalogs of public services and services according to the CPSV-AP-NO specification.
Apache License 2.0
0 stars 0 forks source link
backend harvesting

fdk-public-service-harvester

The harvest process is triggered by messages from RabbitMQ with the routing key publicservice.*.HarvestTrigger, a message will call the method initiateHarvest in the class HarvesterActivity. The actual harvest will start when activitySemaphore has an available permit, when there are no available permits all messages will be queued by the semaphore.

The body of the trigger message has 3 relevant parameters:

A triggered harvest will download all relevant sources from fdk-harvest-admin, download everything from the source and try to read it as a RDF graph via a jena Model. If the source is successfully parsed as a jena Model it will be compared to the last harvest of the same source. The harvest process will continue if the source is not isomorphic to the last harvest or forceUpdate is true.

The actual harvest process will first find services, resources with the type cpsv:PublicService or cpsvno:Service, and catalogs, resources with the type dcat:Catalog, blank node services and catalogs will be ignored. A catalog will also be generated from data about the source from fdk-harvest-admin, any services in the source that is not part of any catalog, indicated by the predicate dcatno:containsService, will be added to this catalog. When all catalogs and services have been found a recursive function will create a graph with every contained triple for all catalogs and services.

The process will save metadata for both services and catalogs:

All blank nodes will be skolemized in the resource graphs.

When all sources from the trigger has been processed a new rabbit message will be published with the routing key public_services.harvested, the message body will be a list of harvest reports, one report for each source from fdk-harvest-admin.

When the rabbit message has been published the semaphore permit is released and a new harvest trigger can be processed.

Requirements

Run tests

mvn verify

Run locally

docker-compose up -d
mvn spring-boot:run -Dspring-boot.run.profiles=develop

Then in another terminal e.g.

% curl http://localhost:8080/public-services

Datastore

To inspect the MongoDB datastore, open a terminal and run:

docker-compose exec mongodb mongo
use admin
db.auth("admin","admin")
use publicServiceHarvester
db.services.find()
db.misc.find()