galaxyproject / galaxy

Data intensive science for everyone.
https://galaxyproject.org
Other
1.4k stars 1.01k forks source link

History archival process at TACC #17101

Open mvdbeek opened 11 months ago

mvdbeek commented 11 months ago

We've hit our storage quota at TACC and need to reduce usage. Here are some ideas collected earlier: https://docs.google.com/document/d/1VAmSjT8B2F0WK6L47BB09GS-pSq5FhkLJ29pgJI6JH0/edit?pli=1

I'd propose that we add a script to ephemeris that targets Galaxy's API for exporting histories to (an admin only) file source. We can then run a cron job from there to tar up archives (300 GB chunks are ideal) and push them to tape via scp.

A second (optional) step could by a file source that transparently allows users to re-import the archives from tape. For that we'd have to map the tar files back to the component archives and re-associate them with the history archives. This can be done after the first step, whlie in the mean time admins can manually restore archives from tape (the tape retrieval also works via scp).

mvdbeek commented 11 months ago

@dannon is working on the export script and mentioned this plan at the backend meeting yesterday