Open mdoering opened 7 years ago
Sounds like a great improvement to me.
I'd recommend we add the ability to reprocess all data with any filter. We could reprocess all occurrences in a genus, or a family for example for new species.
That is a good idea. We could do a simple download with any filter just looking for occurrence ids and then start feeding these into the processing queues.
No download needed -> SOLR paging query? They are unlikely to yield large quantities of data, given that most common (i.e. large quantities such as sparrows etc) occurrence records will be well aligned already
I was just thinking of reusing the download filters and its implementation. But hitting solr directly with a known solr query works probably well too - there shouldn't be too many different kind of filters
Could be as simple as ./reprocess-by-query.sh --query taxonKey=12345 --maxRecPerSec=25
As it takes a lot to rebuild the backbone, reprocess all of our occurrences (the bulk of the work) and rebuild solr indices it would be good to allow for quick, small additions to the backbone.
Adding just a few names directly into postgres would not necessarily require occurrences to be reprocessed and the clb solr index could be updated for that one name too very quickly.
As our backbone data model requires a reference back to some other name usage we could offer adding small indexed checklists to the backbone by:
We could then optionally also think to: