Archive DCFP's NCI ALCG data to tape at CSIRO

Thomas-Moore-Creative commented 4 years ago

Initial use case

Archive Decadal Climate Forecasting Project (DCFP) NCI Australasian Leadership Computing Grants (ALCG) raw netdcf to tape at CSIRO

The CSIRO DCFP is currently working under a recent NCI ALCG merit allocation - https://research.csiro.au/dfp/dcfp-awarded-key-computation-by-the-nci/ . Current storage limitations means that as the effort proceeds large collections of files will need to be archived constantly back to tape at CSIRO. If data-transfer is not fast enough work will stop at NCI due to lack of storage resources.

The current task is to move 8 x 11TB collections of tarfiles where each 11TB collection is a directory with:

96 x 112GB tar files
1 x 40GB tar file
two small directories with log files related to the parallel tarring process

112GB is close to the recommended 100GB files size for the CSIRO tape system and also the size of each model "member" keeping the essential structure of each model run.

Example `rsync` command used:

ruby:/datastore/d/dcfp/CAFE/forecasts/f6> rsync -avPSW <user>@gadi-dm.nci.org.au:/scratch/v14/<user>/tar_tmp/f6.WIP.c5-d60-pX-f6-20181101.20200820_174610 /datastore/d/dcfp/CAFE/forecasts/f6/.

NB: more recent advice from NCI recommends:

rsync -avPS -e "ssh -T -c arcfour -o Compression=no -x" <username>@gadi-dm.nci.org.au:</path/to/source> <dest>

Issues:

slow rsync transfer speeds, ordinarily under 20MB/s, mean that 90TB of data would take over 50 days to transfer in serial.
running 8 simultaneous rsync is possible but the "front-end" spinning disk on CSIRO /datastore is only 15TB
using screen to run rsync commands invariably results in regular broken pipe disconnects and a potential problem arrises when the rsync is restarted and some of the previously transfered files are already moved to tape. rsync may start the transfer over again for these files!!!
a core issue is CONFIDENCE that an rsync is DONE SUCCESSFULLY and has been dmput to tape on /datastore

Solutions:

Ond has already suggested something like the following as an example:

ls *.tar.xz | parallel --lb -j10 "until rsync -ailP --log-file=rsync.log /scratch1/temp/{} pearcey:/scratch1/ ; do echo rsync failed - resyncing {}; sleep 1; done"

but I still need to get my head around how this would work.

hot007 commented 4 years ago

Suggestions of things to try (noting that if you solve the network issue then /datastore disk quota limit -> tape migration will be a huge bottleneck):

parallel rsyncs (Ond's suggestion). Cheeky addition would be to also parallelise over data mover nodes so you're spreading the encryption overhead around.
bbcp for parallel data transfers (e.g. this on Raijin gave me a total transfer speed over the 4 streams of 200MB/s) $ time bbcp -z -Z 10050:10100 -P 4 -r -V -w 1024m -s 20 -S 'ssh -x -a -oFallBackToRsh=no %I -l %U %H /apps/bbcp/15.02.03.01.1/bin/bbcp' ${NCI_USER}@${RAIJIN}:/path/to/remote/data/ /path/to/local/data/ - bbcp is available as a module on Gadi but looks to be a slightly different version)
globus, e.g. globus-url-copy if full Globus system isn't yet available. (e.g. tried this from Pawsey for me a couple of years ago globus-url-copy -tcp-bs 16M -bs 16M -p 4 -vb -r sshftp://${PAWSEY_USER}@${PAWSEY_DATA}/path/to/data/ file:///path/to/local/destination/)
rclone works well for parallel cloud transfers but probably not relevant here
is dcp an option now?
aria2c?

hot007 commented 4 years ago

Also I'd suggest putting a dmput somewhere in your transfer script to try to get it off disk and onto tape ASAP, and be nice to other tape store users, probably?

Thomas-Moore-Creative commented 4 years ago

Also I'd suggest putting a dmput somewhere in your transfer script to try to get it off disk and onto tape ASAP, and be nice to other tape store users, probably?

@hot007 : I assume above you are talking about running a bbcp from the NCI-side? If so how have you triggered a dmput over on Ruby on the CSIRO-side once a specific copy is finished?

hot007 commented 4 years ago

well, you have to trigger it from CSIRO side as you have to pull to CSIRO not push from NCI. So I suppose you could break your copies up and ; dmput * at the end of each. Alternatively I suppose a cron job to check for new data and dmput it that you then disable once the transfers are done (so you can in due course pull the data back to disk again!).

Thomas-Moore-Creative commented 4 years ago

well, you have to trigger it from CSIRO side as you have to pull to CSIRO not push from NCI. So I suppose you could break your copies up and ; dmput * at the end of each. Alternatively I suppose a cron job to check for new data and dmput it that you then disable once the transfers are done (so you can in due course pull the data back to disk again!).

OK! I misread your bbcp code above - you are running this command from CSIRO-side?

I assume this also means one can't use the power of #PBS -q copyq (outlined here: https://opus.nci.org.au/display/Help/bbcp) to move data from Gadi to CSIRO machines? I don't get why the #PBS -q copyq option would be shown in the documents as bbcp -z -P 2 -s 16 -w 4m -S "bbcp" -T "ssh -x -a -oFallBackToRsh=no %I -l %U %H /some/other/place/bin/bbcp" somefiles remoteuser@remotehost.edu:someplace/ if you couldn't "push from NCI"?

But maybe I just need moar coffeeee?

hot007 commented 4 years ago

Well, copyq just gives you a 10hr job limit on your access to gadi-dm ;-) But yes, that code was run FROM Pearcey to (then) raijin-dm. In general you can push from NCI, but you can't push to CSIRO - our machines aren't visible from NCI so like rsync this has to be originated on our side. So in our case we have to specify the path to NCI's bbcp but not ours.

Thomas-Moore-Creative commented 4 years ago

See the developing solution here https://github.com/Thomas-Moore-Creative/CSIRO-NCI-data-best-practice/blob/master/Solution_Archive_DCFP_NCI_ALCG_data_to_CSIRO_tape.md

Thomas-Moore-Creative / CSIRO-NCI-data-best-practice