Allow quick incremental additions to backbone

mdoering commented 7 years ago

As it takes a lot to rebuild the backbone, reprocess all of our occurrences (the bulk of the work) and rebuild solr indices it would be good to allow for quick, small additions to the backbone.

Adding just a few names directly into postgres would not necessarily require occurrences to be reprocessed and the clb solr index could be updated for that one name too very quickly.

As our backbone data model requires a reference back to some other name usage we could offer adding small indexed checklists to the backbone by:

doing a strict nub match for all checklist names and only process those further that do not match
add missing names and potentially their implicit genus & species directly into the backbone in postgres via mybatis, thereby also updating the solr index

We could then optionally also think to:

reprocess all occurrences that have TAXON_MATCH_NONE or TAXON_MATCH_FUZZY
reprocess all name usages in clb that have TAXON_MATCH_NONE (there is no fuzzy match for checklists)

timrobertson100 commented 7 years ago

Sounds like a great improvement to me.

I'd recommend we add the ability to reprocess all data with any filter. We could reprocess all occurrences in a genus, or a family for example for new species.

mdoering commented 7 years ago

That is a good idea. We could do a simple download with any filter just looking for occurrence ids and then start feeding these into the processing queues.

timrobertson100 commented 7 years ago

No download needed -> SOLR paging query? They are unlikely to yield large quantities of data, given that most common (i.e. large quantities such as sparrows etc) occurrence records will be well aligned already

mdoering commented 7 years ago

I was just thinking of reusing the download filters and its implementation. But hitting solr directly with a known solr query works probably well too - there shouldn't be too many different kind of filters

timrobertson100 commented 7 years ago

Could be as simple as ./reprocess-by-query.sh --query taxonKey=12345 --maxRecPerSec=25

gbif / checklistbank

Allow quick incremental additions to backbone #16