Closed nichtich closed 1 year ago
TSV files are always grouped by PPN. The set of rows for each PPN is either as known, e.g:
12345 rvk XY 333
12345 bk 33.33
resultung in rows [{voc: "rvk", notation: "XY 333"}, {voc: "bk", "notation": "33.33"}]
or it's just one row with empty voc
and notation
to only delete the record (rows = []
):
12345
See method updateRecord
in SQLite Backend (dev
branch) to be passed this parsed TSV data.
So the next step would be to add an update script that calls methods in the SQLite backend, and that also allows both partial and full updates? Something like:
# partial update by default
./bin/import update.tsv
# full update with flag
./bin/import --full subjects.tsv
Full updates would clear the whole table instead of deleting records for single PPNs, so we would likely need an additional method in the backend.
Also needs a --modified
flag for #25 and update the modified metadata in the database.
@nichtich I feel like partial imports are not yet 100% clear. My suggestion for the TSV format for partial import would be this:
12345
= delete all records for PPN 12345
12345 rvk
= delete all RVK records for PPN 12345
12345 rvk XY 333
= add record for PPN 12345 (but do not delete anything)
For example, if the update would 1) remove the existing DDC record, 2) replace the one existing RVK record, and 3) add an addition BK record, it would look like this:
12345 ddc
12345 rvk
12345 rvk XY 333
12345 bk 33.33
Or would you prefer to do it differently? I think this would cover all cases, even though removal of a single record would mean all other record for that PPN/vocabulary would need to be listed again. (I think in your case, removal of a single record would mean ALL other records for that PPN, regardless of vocab, would need to be listed again.)
There's now a basic working implementation of the import script. It will be finished in #27.
This is not part of the software but its deployment and configuration, so closing this issue.
Related to #17 there should also be an update script that can handle partial updates. The update could be a .tsv or .tsv.gz file as well but it may include rows with empty vocabulary (just the PPN) to indicate removal of a record:
Alternatively keep a full dump as file and apply update to this file to get an updated full dump (may even be faster, depending on size of updates).
Use case: There are a daily jobs at K10plus CBS database to pass updated records to LBS and to K10plus central Solr index.