DUNE / data-mgmt-ops

3 stars 2 forks source link

Document the stages to bring a data set back to FNAL #619

Closed StevenCTimm closed 1 week ago

StevenCTimm commented 2 weeks ago

Since deployment of Justin 1.01 it is now necessary to bring the datasets back to Fermilab or wherever else they are supposed to go. The files are held on disk at the remote rses by rse and stage/patterned based rule sets, and the full dataset is still created, you just have to make a rule.

1) Detect the dataset--this has to be done by polling Justin for now and looking at the output patterns of each workflow and making sure the workflow in question is done. Oftentimes more than one output data set is made per workflow, VD stuff usually have four datasets. 2) make the rule.. one rule per dataset will suffice as long as there are 10K files or less in the dataset (which, pending a new Justin feature, will usually be true going forward) 3) if > 10000 will probably have to convert the overarching dataset to a Container with the per-rse datasets being added to the Container and then move the individual rse/stage/pattern datasets individually.

Need to test out the above as soon as we have a good output, probably from keepup.

StevenCTimm commented 1 week ago

Done in https://github.com/DUNE/data-mgmt-ops/wiki/JustIN-workflow%E2%80%90%E2%80%90bringing-data-to-Fermilab First rule for workflow 2382 is made.