LSSTDESC / csd3_uk_tasks

A repo to keep track of any activities relating to setting up and running LSST DESC jobs at CSD3
2 stars 0 forks source link

Transfer Roman DESC sims from NERSC to CSD3 #3

Closed nsevilla closed 2 weeks ago

nsevilla commented 3 weeks ago

@jchiang87 says it is approximately 300 TB, location TBC Destination is TBC at CSD3

nsevilla commented 3 weeks ago

@welucas2 tested through globuspersonalconnect to transfer a small file from NERSC to CSD3

heather999 commented 3 weeks ago

Copying Jim's comment on Slack here: Since we already have the visit-level processing done at NERSC, there's probably no reason to redo it at CSD3 if we can just copy those data products directly into a repo. That will save us 200 TB of space since we won't need to copy over the raw data. The visit-level outputs are ~280 TB. Those data are in the repo at NERSC, alongside the raw data and some other data products from downstream processing, so we will need to export the visit-level data to a staging area. From there we can transfer to CSD3, and then import into a repo that we've set up at CSD3. We'll also need to export calibration products to transfer along with reference catalogs and those will also need to be ingested. We can do the export and staging at NERSC, and we can also help with the ingest at the CSD3 side.

nsevilla commented 3 weeks ago

Copying Dave's comment On the destination, I'd assume it'd be more efficient if it was on the lustre filesystem used by the CSD3 compute nodes, as opposed to the filesystem where the database will sit. Something like /rds/project/rds-rPTGgs6He74/DESC/data on CSD3. Paths like /home/$USER/rds/rds-iris-ip005 resolve to /rds/project/rds-rPTGgs6He74 , therefore we refer to this as "The" RDS for ip005, which is the lsst project. i.e.,: [ir-mcka1@login-p-4 rds-iris-ip005]$ pwd /home/ir-mcka1/rds/rds-iris-ip005 [ir-mcka1@login-p-4 rds-iris-ip005]$ pwd -P /rds/project/rds-rPTGgs6He74

welucas2 commented 3 weeks ago

Using Globus PE to set up an endpoint on CSD3. Though we can all share an installation, I don't believe we can share configs, so anyone who wants to transfer to/from CSD3 will need to set up their own endpoint which they bring up and down as they need.

  1. Download and extract Globus in target install directory:

     wget https://downloads.globus.org/globus-connect-personal/linux/stable/globusconnectpersonal-latest.tgz
     tar -xvf globusconnectpersonal-latest.tgz
  2. Configure Globus:

     cd globusconnectpersonal-3.2.5/
     ./globusconnectpersonal -setup --no-gui

    The latter command generates the following output:

     Globus Connect Personal needs you to log in to continue the setup process.
    
     We will display a login URL. Copy it into any browser and log in to get a
     single-use code. Return to this command with the code to continue setup.
    
     Login here:
     -----
     https://auth.globus.org/v2/oauth2/authorize?client_id=[... snip]
     -----
     Enter the auth code:

    This will wait on the prompt for the auth code. Copy and paste the very long login URL into a browser. That will take you to the Globus website to log in. You'll need to come here anyway to do the transfer. Once logged in, you'll be asked to give a label for the CSD3 setup (for example I'm using CSD3-DESC-WL), then move on. The next page provides the auth code, which you should copy and paste back into the waiting prompt at the terminal, which will then ask you to give the endpoint a name (I'm using the same name as the label).

  3. It's probably a good idea to make sure this collection can only see into the Roman data directories on CSD3, so create the file ~/.globusonline/lta/config-paths and edit it to contain the line:

     /rds/project/rds-rPTGgs6He74/desc/roman-data,0,1

    which means that the collection will allow access to this directory only, and with read/write permissions.

  4. Bring the collection up to run in the background with:

     ./globusconnectpersonal -start &

    Check its status with:

     ./globusconnectpersonal -status

    You should at this point be able to find your new CSD3 collection if you search on https://app.globus.org for the name you provided in step 2 above. You should now be able to transfer between CSD3 and the NERSC DTN -- if you haven't yet linked Globus with your NERSC identity, it will ask you do to do so now.

  5. Once finished, bring the collection down with:

     ./globusconnectpersonal -stop
jchiang87 commented 3 weeks ago

The visit level outputs, comprising the step1 and step2 DRP data products, to transfer to CSD3 are in

/global/cfs/cdirs/lsst/production/gen3/roman-desc-sims/repo/u/descdm

at NERSC. The glob patterns for the folders and their total numbers are

step1_*_w_2024_22          60 folders
step2a_*_w_2024_22         60
step2b*w_2024_22            2
step2d_*_w_2024_22         60
step2e_w_2024_22            1

This should be ~300 TB in total.

nsevilla commented 3 weeks ago

We are seeing somewhat sluggish data transfer rates (@welucas2 reports 140 mbps). Dave McKay says could be typical of login node. @markgbeckett to inquire about the possibility of using the SKA DTN to increase potentially to 2x1Gbps.

markgbeckett commented 3 weeks ago

CSD3 seem confident we can use the SKA network connection and will add William to the access list for this. For politeness, George to advise the UK SKA leadership team our plan.

markgbeckett commented 3 weeks ago

UK SKA Regional Centre PI (Rob Beswick) confirms we can use the SKA network connection for the data transfer.

welucas2 commented 3 weeks ago

Summary of the past day.

Status of transfers:

Actions taken:

welucas2 commented 2 weeks ago

Status of transfers:

I'm running some md5sums on the tar archives now for peace of mind that they were transferred correctly from NERSC, but otherwise that's hopefully us done here.

jchiang87 commented 2 weeks ago

There are a couple of smaller items remaining to transfer that are needed for setting up the repo:

I need to export the calibs, and I need to find the right set of refcats to use. I'll prepare both of those this weekend and will update this post with locations.

I've tarred up the reference catalogs into

/global/cfs/cdirs/lsst/production/gen3/roman-desc-sims/shared/refcats/uw_stars_20240529_tp_aug_2021_downselect.tar

and I've exported the calibs from /repo/roman-desc-sims and tarred them as well:

/global/cfs/cdirs/lsst/production/roman-desc-sims/calibs/roman-desc-sims_calibs.tar

I'd suggest leaving these tarred up until we sort out the final location for these files.

welucas2 commented 2 weeks ago

Transfers for uw_stars_20240529_tp_aug_2021_downselect.tar and roman-desc-sims_calibs.tar to CSD3 have been completed. The tars are in /rds/project/rds-rPTGgs6He74/desc/roman-data.

nsevilla commented 2 weeks ago

Sounds like a successful resolution, thanks @welucas2 !

jchiang87 commented 4 days ago

I've renamed the directory to /rds/project/rds-rPTGgs6He74/desc/roman-rubin-data to avoid the inference that these are Roman simuations.