DigitalSlideArchive / HistomicsTK

A Python toolkit for pathology image analysis algorithms.
https://digitalslidearchive.github.io/HistomicsTK/
Apache License 2.0
387 stars 114 forks source link

Script for database migration #1081

Open kwsp opened 9 months ago

kwsp commented 9 months ago

Currently there isn't a documented method to easily migrate one deployment to another (as far as I'm aware). I recently ran into a problem where I had to tear down a deployment and move to another, but I already had a lot of manual labels stored in there. I can import all the digital WSI again, but didn't know how to migrate all the labels stored in MongoDB.

The solution I found was combining the simple local backup function provided by histomicstk, dump_annotations_locally, as shown in this example here (https://digitalslidearchive.github.io/HistomicsTK/examples/annotation_database_backup_and_sql_parser.html) with some custom code that essentially reversed the dump (read the dumped JSON annotation and carefully POST them back to the DSA), as long as the directory structure in DSA for the WSI uploaded is the same.

Do the maintainers think this is worthy of contributing back to HistomicsTK, along with an example for simple migration of deployment?

manthey commented 9 months ago

There is an unofficial script that we've used for this: https://github.com/DigitalSlideArchive/large-image-utilities/blob/main/copy_annotations.py that copies images, annotations, folders, collections, etc between two different deployments. This functionally does what you are mentioned. It doesn't have any tests and probably doesn't work in some instances where it should (such as copying from a collection to a user), but maybe it is time to clean up this script and write tests for it.

kwsp commented 9 months ago

Testing that script is a good idea. One issue I see with that script is that both instances must be running for it to work. It might be useful to be able to store a local dump of the DSA that could be restored through the API.

manthey commented 9 months ago

If all your assetstore paths are the same, you can use mongodump and mongorestore: https://github.com/DigitalSlideArchive/digital_slide_archive/blob/6405bd53d7d048634aee53a175c8be02eb560258/devops/dsa/README.rst#database-backup