Closed stefandesu closed 5 years ago
Pleas have a look at jskos-cli. The import script should provide a similar calling syntax:
jskos-import [options] type [file]
where type
is given by string, detected with guessObjectType:
jskos-import schemes terminologies.ndjson
jskos-import mappings mappings.ndjson
...
We only need to add object type Annotation
, not detected yet.
question: should this remove all existing concepts for that scheme before import? (would mean that concepts for a scheme could only be imported all at once)
No, better provide an additional command to purge selected parts of the database.
Each object should be verified before import. Question: Should that object not be imported if verification failed?
In general never import invalid records. Question is whether to abort and possibly reset the import if one record fails. This would require to pre-validate the whole file before import. I added option --validate
to jskos-convert
, maybe it could be handles similar?. Short answer: by default just omit faulty records and emit an error message.
Thanks for the hint about jskos-cli, it makes sense to use a similar calling syntax!
Annotations are difficult because they aren't part of JSKOS and we're currently using the Web Annotation Data Model. Should annotations be added to the JSKOS spec, maybe implementing only the parts of the Web Annotation Data Model that are relevant to us?
A small subset of Annotations should be added to JSKOS: https://github.com/gbv/jskos/issues/73
TODO:
scripts
directory--format
to import from an API Except that Travis is currently failing (although the tests work locally), I'd consider the import script done for now. Note that to use the command jskos-import
, you need to link it to your path (npm link
) which only works for one jskos-server instance (also mentioned in README).
The current import process is inefficient, complicated, and not well documented (#24). A full rewrite of the import script is necessary. It consists of the following parts:
scripts/import.sh
).The import Node script could have parameters similar to the current
import.js
(but everything is in one file):--terminologies
or-t
:concepts
andtopConcepts
properties (by searching for concepts of that terminology) either to[null]
(if concepts/top concepts exist) or toundefined
(see #49)--concepts
or-c
:concepts: [null]
and (if applicable)topConcepts: [null]
(see #49)narrower
property to either[]
or[null]
--concordances
or-k
:distribution
propertydistribution
property to jskos-server API URLs--mappings
or-m
:partOf
if necessary--annotations
or-a
:--index
or-i
:Each object should be verified before import. Question: Should that object not be imported if verification failed?
Also, each import is handled as a stream. While the stream is read, objects are written to the database in batches (e.g. of 10000 at a time) to reduce necessary RAM.
Also, the script should provide meaningful output, i.e. progress indication as well as error messages (e.g. "validation failed for object in line X, not imported").
Any thoughts or improvements?