Closed morsecodist closed 4 years ago
Also I'm wondering what is the plan to update the existing prod and staging DBs.
Also I'm wondering what is the plan to update the existing prod and staging DBs.
We can just run the import commands at any time after shipping this.
Description
This PR fixes issues that caused missing records in elasticsearch. It also makes our elasticsearch indexing asynchronous.
Issues
These issues occurred because we add things to Elasticsearch with [elasticsearch-model[(https://github.com/elastic/elasticsearch-rails/tree/master/elasticsearch-model), which relies on Active Record Callbacks. Any modifications that circumvent the Active Record Callback Flow result in missing data in Elasticsearch.
Taxon Lineages
These are only updated via the task
update_lineage_db
. About a year ago this task was modified to use raw SQL. We have been missing updates since then. To fix this I added a step to re-index after this job runs. It is a little slow but we can index the whole thing in <30 minutes and this task only runs once ever 28 days so I feel it isn't a huge deal.Metadata Bulk Import
To do bulk metadata importing we are using https://github.com/zdennis/activerecord-import. This circumvents the Active Record Callback Flow. This is necessary to avoid blocking synchronous updates that would render the feature very slow. I created a version of this method that also indexes in Elasticsearch. However, this would still involve blocking synchronous updates without the async change I made.
Async Elasticsearch
Currently we are blocking considering writes complete on updating the Elasticsearch Index. This slows down all of our database interactions with these tables. It also fails database operations when the re-index is not really required right away and could have been retried. In their docs elasticsearch-model recommends async updates. These callbacks also have logging and alerting so we can debug potential missingness in Elasticsearch in the future, and we can know when to initiate a re-index.
Tests
Tested all of the relevant models, bulk imports, as well as the rake task.