clarin-eric / VLO

Virtual Language Observatory
GNU General Public License v3.0
14 stars 6 forks source link

Incremental importing #34

Open twagoo opened 7 years ago

twagoo commented 7 years ago

Once the harvester can produce a delta (after a (partially) incremental harvest), use this information to update the changed (deleted or modified) subset of records in the VLO, add the new ones and ignore the rest. This should drastically reduce the total import time and number of operation on the SOLR index in most cases.

TODO: work out the details, which will depend on the exact specification of the harvester's behaviour in this regard

twagoo commented 7 years ago

Note: there's some synergy with #29 and #50 in that the 'orchestration' of the importer tasks probably requires some substantial refactoring for these tasks.