Open TomGlanzman opened 7 years ago
At today's Twinkles meeting discussion on this topic, the following ideas were put forward:
Note that we have three datasets that would occupy ~75TB storage: phosim-DC1, imisim-DC1-dithered and imsim-DC1-undithered. imsim undithered will be somewhat smaller than the other two since it has only 150k sensor-visits vs ~190k each for phosim-DC1 and imsim-DC1-dithered.
We will talk about this tomorrow but how much space do we have and where can we keep at least the partial outputs for approximately the next year or so?
Here is the status of this issue:
/global/projecta/projectdirs/lsst/production/DC1/DM/DC1-imsim-dithered
This should be accessible to anyone in the lsst
group.
If the HPSS backup is now finished, can we document where it is (i.e. how to retrieve it) and close this issue?
DC1 PhoSim generated 5 TB of simulated images DC1 DM pipeline read those images and generated 75 TB of output data
This is a big problem.
The PhoSim data were stored in DESC "project space", which now contains 65 TB of an 81 TB quota. The DM pipeline data were stored in global scratch space. Technically, scratch space is limited to 20 TB per user so we are way over quota (quotas do not seem to be enforced at present). In addition, scratch directories are purged after 12 weeks -- unless the data is accessed, which resets the clock. Purging has not been done up until now, but NERSC has recently warned it was about to start.
Here is a look at the DM pipeline directory, along with a summary of space consumed:
tony_j@cori17:/global/cscratch1/sd/descdm/DC1/DC1-imsim-dithered> du -hs *
What to do with these data? One suggestion was to move it to HPSS (tape). But doing so does not look all that attractive. Such a move would:
Another suggestion is to prune the extraneous/unneeded/unwanted files. Can we agree on what to prune and how to do it?