Export of selected table entries and associated files to a different database / spyglass install

lfrank commented 1 month ago

It seems possible (perhaps likely?) that different groups will have their own databases, but would like to be able to import a set of analyses / results from another group. This could be something like issue #861 but with a provision to transfer entries to a different database.

There are multiple complexities here, but if this were possible it might be that it would be really useful.

CBroz1 commented 1 month ago

The current export process yields sql files that could be loaded into any database, including lines that drop existing tables to redefine them
This repo uses those exports as a basis for a single-analysis container
To load the same files into an existing database would require...
- Either (a) adding flags to the existing dump process to append rather than declare or (b) editing the sql files with some sed/awk operations to remove DROP statements and change CREATE to CREATE IF NOT EXISTS
- an appropriately credentialed user to run a handful of bash commands, which could be integrated into spyglass itself somewhere

Some questions come to mind regarding data integrity ...

What if there are naming collisions in entries?
- Case 1: LabA has 'subject1' and tries to load LabB's 'subject1', a different subject.
- Case 2: LabA and LabB both have data from the same 'subject1', run with 'ParamsA', but this paramset was defined differently in each case.
- How should a load handle conflicts? It could...
- Simply reject a load with overlapping names
- Assume collision refers to the same entity (e.g., assume default paramsets have not been changed)
- Append some value to the loaded case, (e.g., 'subject1_imported{DATE}')
- Pairwise compare every case of collision, including data stored as blobs, time intensive
What if there are differences in table definitions?
- Case 3: LabA has kept up with table alters (e.g., adding new fields), but LabB never ran these alters when updating Spyglass
- Case 4: LabA and LabB do not share the exact same definition of a downstream custom table
- How should a load handle these cases? It could ...
- Reject the load
- Rename the custom tables (e.g., 'CustomTableImported{DATE}')
- Attempt to suggest changes to the imported file or alter existing tables

Any monitoring of the ingestion process to resolve collisions is going to be a major lift of parsing error messages from SQL, which DataJoint is better equipped for than Spyglass (maybe worth a feature request from them?). A skilled user could manage these decisions working with SQL directly, but I'm not confident in our ability to do it programmatically in Python. A featurefull approach might be an effort on par with expanding DataJoint by 30% to handle all possible error codes and reverting on fail.

An alternate approach might look more like a 'replication tool' that exported a spec of paramsets to run, and then applied them to a different database. This would require rerunning all computations, but it would allow datajoint and/or the end-user to handle collisions one-by-one

lfrank commented 1 month ago

Great points, and indeed the replication tool might be by far the best way to approach this given all the challenges. Let's discuss when you're back in town.

LorenFrankLab / spyglass

Export of selected table entries and associated files to a different database / spyglass install #1129