Closed mgrbyte closed 4 years ago
Multi-DB setup sounds good to me! It's quite a neat PR that you've put together.
I'm just curious that if we will continue to build Datomic databases from scratch each release, is it still preferable that we separate the homology data from the rest?
I think so. This approach (separate db) was suggested by @khowe , but is something that we've discussed before...
Currently, a subset of the data that's stored in the final/build ACeDB database is computed as a result of running various EnsEMBL pipelines (Compara, BLASTP, etc). Where possible, moving the data that's computed into a separate store makes sense - as adding back into the main database "ties our hands" in the long term to making a build database.
It can be argued that datomic isn't the right database platform for storing such computed data,
as all schema items have no useful history (and should be marked with :db/noHistory
in the schema for the new db if not already).
I believe the choice of using datomic here (instead of some other DB) is one of convenience, familiarity and expediency.
@mgrbyte Thanks a lot for explaining the reasoning.
@a8wright @sibyl229 Just to note that I'm currently doing another run of the main migration (WS273) with this code, just to check that these changes won't break subsequent migration runs. Once done, I'll merge this. Thanks!
Implements #81 (No rush to merge this - prefer we test the db first; can make any amendments to this PR)
Converts homology from ACeDB to a seperate database, for Motif and Protein classes only. This is accomplished by creating "stub" entities for all motif and protein objects in the new database, then converting the motif and homology "locatable" entities associated with each, using a variation of the original's code in
locatable_import.clj
.This requires adding 5 new commands to the migration process (but they could be run in parallel or at any time that the source ACeDB database is made available)
By default, this new datomic database is stored in the same DynamoDB table.
Below is some examples of using the new database.