DUNE / data-mgmt-ops

3 stars 3 forks source link

New convention of which RSE's JustIN may use for any given workflow #672

Open StevenCTimm opened 1 month ago

StevenCTimm commented 1 month ago

Currently JustIN looks at the decommissioned field to see if that is true or false, and at the enable_read and enable_write fields to see if it can be used as an input source or output source respectively. It then looks at the internal Justin enable/disable flags only at run time, but goes ahead and creates all the run-specific res-specific datasets for that RSE whether it ever actually writes anything there at once.

We need at least three other categories:

1) the local-only RSE, namely T3_US_NERSC This must be available for rucio upload from NERSC itself but not from elsewhere, likewise it must be available for streaming from NERSC itself but not elsewhere. WAN reads and writes via rucio rules are allowed.

2) the testing RSE, this would be an RSE that is not enabled for production yet, or may never be. JustIN ought not touch this RSE.. examples FNAL_DCACHE_STAGING, FNAL_DCACHE_TEST.

3) the read-only RSE--this would be an RSE where we are not writing new data to but still have some datasets that we may have to read.

Andrew-McNab-UK commented 1 month ago

You need to be able to temporarily disable an RSE for reading and for writing from within jobs separately in the Rucio settings too. This will allow downtimes by blocking justIN from trying to use the RSE but still allow you to do testing with Rucio to see if it's fixed. By doing it inside Rucio you also tell other people who are reading from RSEs outside of justIN. Potentially writing to RSEs from outside too.

For justIN, you don't need 1. as we'll do it with the distance table justIN has: NERSC CPU and disk will just be disconnected from everywhere else and not have a distance, so no matching from outside will happen.