RENCI / fuse-immcellfie

Main API, tickets, and project board to be used by dashboards supporting ImmCellFIE project
MIT License
2 stars 0 forks source link

fuse-analysis: Mounted volume data gets orphaned on restart #40

Open krobasky opened 2 years ago

krobasky commented 2 years ago

It looks like right now, the code is saving all the run parameters and results to files mounted by the container, and the metadata is being stored in tx-persistent, but when the containers are restarted the metadata is lost and not reconstructed from the mounted directory. So we need to either cleanup the mounted volume upon exit (not advised) or somehow reconstruct the metadata from the mounted volume upon restart. I wonder if we should save the metadata on the mounted file system instead of mongo since we rarely if every edit the data.

To reproduce the problem:

  1. Create a an Immunespace Groupid with a dataset known to have expression data
  2. Download the data using the API and your own email address
  3. check the data exist under the 'data' directory from where the 'api' container is deployed
  4. restart the fuse-analysis containers from the same location: docker-compose -f docker-compose.yml up --build -V -d
  5. query for all the download id's associated with your email
  6. Look at the data directory Expected: query returns one download id per objects under the /data directory Observed: empty download id list even though /data directory download directories exist
krobasky commented 2 years ago

Perhaps the solution is to keep a ledger of deletion requests and create a script that can read the ledger and intermittently cleanup the unlinked data objects. Remember, 'delete' has a different meaning from 'deprecate' - if data is modified at the service provider, it can be deprecated so that future analyses won't use it, but needs to remain in the local cache in support of legacy analyses that used the old version. However if data is redacted or corrupted, it needs to be deleted entirely. This ticket concerns the latter.