Open nsevilla opened 6 days ago
@welucas2 tested through globuspersonalconnect to transfer a small file from NERSC to CSD3
Copying Jim's comment on Slack here: Since we already have the visit-level processing done at NERSC, there's probably no reason to redo it at CSD3 if we can just copy those data products directly into a repo. That will save us 200 TB of space since we won't need to copy over the raw data. The visit-level outputs are ~280 TB. Those data are in the repo at NERSC, alongside the raw data and some other data products from downstream processing, so we will need to export the visit-level data to a staging area. From there we can transfer to CSD3, and then import into a repo that we've set up at CSD3. We'll also need to export calibration products to transfer along with reference catalogs and those will also need to be ingested. We can do the export and staging at NERSC, and we can also help with the ingest at the CSD3 side.
Copying Dave's comment On the destination, I'd assume it'd be more efficient if it was on the lustre filesystem used by the CSD3 compute nodes, as opposed to the filesystem where the database will sit. Something like /rds/project/rds-rPTGgs6He74/DESC/data on CSD3. Paths like /home/$USER/rds/rds-iris-ip005 resolve to /rds/project/rds-rPTGgs6He74 , therefore we refer to this as "The" RDS for ip005, which is the lsst project. i.e.,: [ir-mcka1@login-p-4 rds-iris-ip005]$ pwd /home/ir-mcka1/rds/rds-iris-ip005 [ir-mcka1@login-p-4 rds-iris-ip005]$ pwd -P /rds/project/rds-rPTGgs6He74
Using Globus PE to set up an endpoint on CSD3. Though we can all share an installation, I don't believe we can share configs, so anyone who wants to transfer to/from CSD3 will need to set up their own endpoint which they bring up and down as they need.
Download and extract Globus in target install directory:
wget https://downloads.globus.org/globus-connect-personal/linux/stable/globusconnectpersonal-latest.tgz
tar -xvf globusconnectpersonal-latest.tgz
Configure Globus:
cd globusconnectpersonal-3.2.5/
./globusconnectpersonal -setup --no-gui
The latter command generates the following output:
Globus Connect Personal needs you to log in to continue the setup process.
We will display a login URL. Copy it into any browser and log in to get a
single-use code. Return to this command with the code to continue setup.
Login here:
-----
https://auth.globus.org/v2/oauth2/authorize?client_id=[... snip]
-----
Enter the auth code:
This will wait on the prompt for the auth code. Copy and paste the very long login URL into a browser. That will take you to the Globus website to log in. You'll need to come here anyway to do the transfer. Once logged in, you'll be asked to give a label for the CSD3 setup (for example I'm using CSD3-DESC-WL
), then move on. The next page provides the auth code, which you should copy and paste back into the waiting prompt at the terminal, which will then ask you to give the endpoint a name (I'm using the same name as the label).
It's probably a good idea to make sure this collection can only see into the Roman data directories on CSD3, so create the file ~/.globusonline/lta/config-paths
and edit it to contain the line:
/rds/project/rds-rPTGgs6He74/desc/roman-data,0,1
which means that the collection will allow access to this directory only, and with read/write permissions.
Bring the collection up to run in the background with:
./globusconnectpersonal -start &
Check its status with:
./globusconnectpersonal -status
You should at this point be able to find your new CSD3 collection if you search on https://app.globus.org for the name you provided in step 2 above. You should now be able to transfer between CSD3 and the NERSC DTN -- if you haven't yet linked Globus with your NERSC identity, it will ask you do to do so now.
Once finished, bring the collection down with:
./globusconnectpersonal -stop
The visit level outputs, comprising the step1 and step2 DRP data products, to transfer to CSD3 are in
/global/cfs/cdirs/lsst/production/gen3/roman-desc-sims/repo/u/descdm
at NERSC. The glob patterns for the folders and their total numbers are
step1_*_w_2024_22 60 folders
step2a_*_w_2024_22 60
step2b*w_2024_22 2
step2d_*_w_2024_22 60
step2e_w_2024_22 1
This should be ~300 TB in total.
We are seeing somewhat sluggish data transfer rates (@welucas2 reports 140 mbps). Dave McKay says could be typical of login node. @markgbeckett to inquire about the possibility of using the SKA DTN to increase potentially to 2x1Gbps.
CSD3 seem confident we can use the SKA network connection and will add William to the access list for this. For politeness, George to advise the UK SKA leadership team our plan.
UK SKA Regional Centre PI (Rob Beswick) confirms we can use the SKA network connection for the data transfer.
Summary of the past day.
Status of transfers:
step2*
directories have been transferred as these are comparatively small.
step2a*
directories contain 86,694 files in 104,946 subdirectories for a total of 1.85 TB.step2b*
directories contain 368 files in 380 directories for a total of 31.5 GB.step2d*
directories contain 101,128 files in 122,377 subdirectories for a total of 1.08 TB.step2e_w_2024_22
directory contains 9 files in 11 subdirectories for a total of 495 MB.step1*
directories are 4.8 TB each and contain in total 23,109,192 files in 359,229 subdirectories. These have not yet been transferred.Actions taken:
step2a*
and step2d*
directories. Internal Globus logs show moment-to-moment speeds of 900 mbps.step1*
directories. After discussion, we decided it was worthwhile seeing how quickly single file archives of the individual step1*
directories could be transferred, to see if the issue was the per-file overhead..tar
archive of step1_000_w_2024_22
was created overnight.
time
returns that this took the following:
real 295m19.359s
user 2m25.542s
sys 101m18.339s
The difference between real
and user
+sys
indicates I/O took most of the time.
step1*
directories. As the numbers above indicate that archiving just one takes 5 hours, four instances of the script are being run in parallel over a quarter of the directories each to hopefully complete the process sooner/global/cfs/cdirs/lsst/production/roman-desc-sims/step1_data_staging
so they can be reused for later transfers elsewhere.
@jchiang87 says it is approximately 300 TB, location TBC Destination is TBC at CSD3