gbv / jskos-server

Web service to access JSKOS data
https://coli-conc.gbv.de/api/
MIT License
6 stars 4 forks source link

Rewrite import script #56

Closed stefandesu closed 5 years ago

stefandesu commented 5 years ago

The current import process is inefficient, complicated, and not well documented (#24). A full rewrite of the import script is necessary. It consists of the following parts:

  1. Good documentation!
  2. A bash script for batch imports (similar to the current scripts/import.sh).
  3. One Node script to import a file.
  4. One Node script to (re)create database indexes (only has to run once in the beginning and if indexes are changed).

The import Node script could have parameters similar to the current import.js (but everything is in one file):

Any thoughts or improvements?

nichtich commented 5 years ago

Pleas have a look at jskos-cli. The import script should provide a similar calling syntax:

jskos-import [options] type [file]

where type is given by string, detected with guessObjectType:

jskos-import schemes terminologies.ndjson
jskos-import mappings mappings.ndjson

...

We only need to add object type Annotation, not detected yet.

question: should this remove all existing concepts for that scheme before import? (would mean that concepts for a scheme could only be imported all at once)

No, better provide an additional command to purge selected parts of the database.

Each object should be verified before import. Question: Should that object not be imported if verification failed?

In general never import invalid records. Question is whether to abort and possibly reset the import if one record fails. This would require to pre-validate the whole file before import. I added option --validate to jskos-convert, maybe it could be handles similar?. Short answer: by default just omit faulty records and emit an error message.

stefandesu commented 5 years ago

Thanks for the hint about jskos-cli, it makes sense to use a similar calling syntax!

Annotations are difficult because they aren't part of JSKOS and we're currently using the Web Annotation Data Model. Should annotations be added to the JSKOS spec, maybe implementing only the parts of the Web Annotation Data Model that are relevant to us?

nichtich commented 5 years ago

A small subset of Annotations should be added to JSKOS: https://github.com/gbv/jskos/issues/73

stefandesu commented 5 years ago

TODO:

stefandesu commented 5 years ago

Except that Travis is currently failing (although the tests work locally), I'd consider the import script done for now. Note that to use the command jskos-import, you need to link it to your path (npm link) which only works for one jskos-server instance (also mentioned in README).