UAlbertaALTLab / crk-db

Managing the Plains Cree dictionary database
https://itwewina.altlab.app/
GNU General Public License v3.0
0 stars 3 forks source link

create import script #17

Closed dwhieb closed 3 years ago

dwhieb commented 3 years ago

Write a script which imports new versions of any of our data sources into the ALTLab database incrementally. This script should:

  1. identify new / removed / changed entries (and ignore the rest)
    1. get an ordered set of the keys to any existing subentries in the ALTLab database
    2. get an ordered set of the keys to the entries in the data source
    3. in order, use assert.deep(Strict)Equal() to compare the records in each set (or maybe just use the lastUpdated property)

(The above procedure will likely be somewhat slow, but will avoid the need for passing both the previous and current versions of the data to the script. The entirety of each of our original data sources is stored within the ALTLab database (using the alternativeAnalyses fields), so we can use that to determine what updates are needed.)

For each difference:

  1. add / remove / update the subentries (Lexeme/alternativeAnalyses) in the ALTLab database as appropriate
    • The remove / update actions will likely be the same regardless of the data source.
    • The add action will likely be specific to the data source, and will have to consider which of the matching lemmas the new subentry should be added to (see #19).
    • The add action can also do the work of guessing the inflectional class of the entry, for easier matching.
  2. normalize / clean the subentries (in memory)
    • The normalization scripts will be specific to the data source.
  3. aggregate the normalized subentries, updating the main entry with the result (see #18)

The import script should also produce a change report each time it is run.

Input

dwhieb commented 3 years ago

There will be individual import scripts for each database instead.