gbif / checklistbank

GBIF Checklist Bank
Apache License 2.0
31 stars 14 forks source link

Allow quick incremental additions to backbone #16

Open mdoering opened 7 years ago

mdoering commented 7 years ago

As it takes a lot to rebuild the backbone, reprocess all of our occurrences (the bulk of the work) and rebuild solr indices it would be good to allow for quick, small additions to the backbone.

Adding just a few names directly into postgres would not necessarily require occurrences to be reprocessed and the clb solr index could be updated for that one name too very quickly.

As our backbone data model requires a reference back to some other name usage we could offer adding small indexed checklists to the backbone by:

We could then optionally also think to:

timrobertson100 commented 7 years ago

Sounds like a great improvement to me.

I'd recommend we add the ability to reprocess all data with any filter. We could reprocess all occurrences in a genus, or a family for example for new species.

mdoering commented 7 years ago

That is a good idea. We could do a simple download with any filter just looking for occurrence ids and then start feeding these into the processing queues.

timrobertson100 commented 7 years ago

No download needed -> SOLR paging query? They are unlikely to yield large quantities of data, given that most common (i.e. large quantities such as sparrows etc) occurrence records will be well aligned already

mdoering commented 7 years ago

I was just thinking of reusing the download filters and its implementation. But hitting solr directly with a known solr query works probably well too - there shouldn't be too many different kind of filters

timrobertson100 commented 7 years ago

Could be as simple as ./reprocess-by-query.sh --query taxonKey=12345 --maxRecPerSec=25