dmwm / CRABServer

16 stars 38 forks source link

fill properly tm_dbs_blockname and tm_block_complete in RUCIO_Transfers #7524

Closed belforte closed 1 year ago

belforte commented 1 year ago

now that updateRucioInfo API is available in REST https://github.com/dmwm/CRABServer/blob/2cbd932bd64724dcbcca915f21665af1d9b5a08f/src/python/CRABInterface/RESTFileTransfers.py#L185-L198

usage example in https://github.com/belforte/utils/blob/master/notebooks/TestTransfersdb.ipynb

And from https://github.com/dmwm/CRABServer/wiki/ASO-via-Rucio the plan is:

belforte commented 1 year ago

Findings from looking at code in https://github.com/dmwm/CRABServer/blob/master/scripts/task_process/RUCIO_Transfers.py

belforte commented 1 year ago

bugs found

belforte commented 1 year ago

steps to implement the needed functionality

belforte commented 1 year ago

after discussion with @dciangot , some code reformatting is needed to implement internal bookkeeping as now sketched in the wiki

that overrides all above concepts/questions

belforte commented 1 year ago

work in progress on this is in https://github.com/belforte/CRABServer/tree/fill-block-info-from-RT-fix-7524

belforte commented 1 year ago

here's a contribution from Wa: https://github.com/belforte/CRABServer/compare/fill-block-info-from-RT-fix-7524...novicecpp:CRABServer:rucio_transfer_fill_info_to_db_for_publisher_upstream_belforte

novicecpp commented 1 year ago

Items Stefano and Wa were discussed on May 10:

novicecpp commented 1 year ago

Next to do list for RUCIO_Transfers.py

Some tasks are moved to https://github.com/dmwm/CRABServer/issues/7632#issue-1707380092

belforte commented 1 year ago

I can not find the place in above lists, but IIRC the action on me was to explains how and when RUCIO_Transfers.py stops running. This is controlled in these lines: https://github.com/dmwm/CRABServer/blob/18af83f510e5ff8579ad30a5073cc97af9c060a9/scripts/task_process/task_proc_wrapper.sh#L36-L51 combined with the while True look at the bottom of the script.

In other words: the task_process will execute RUCIO_Transfers.py every 5 minutes until all the PostJob have completed. If we want to keep waiting and testing, the PostJob has to wait

How much we wait is a classAd in the job https://github.com/dmwm/CRABServer/blob/18af83f510e5ff8579ad30a5073cc97af9c060a9/src/python/TaskWorker/Actions/PostJob.py#L2543-L2545

The value of ASOTimeout if set for the taks by https://github.com/dmwm/CRABServer/blob/18af83f510e5ff8579ad30a5073cc97af9c060a9/src/python/TaskWorker/Actions/DagmanCreator.py#L471-L475

Using values from TaskWorker congiguration We currently have a 7 days timeout in PostJob when using Rucio as per https://gitlab.cern.ch/ai/it-puppet-hostgroup-vocmsglidein/-/blob/master/code/templates/crabtaskworker/taskworker/TaskWorkerConfig.py.erb#L84-87

belforte commented 1 year ago

One more thing:

novicecpp commented 1 year ago

Thanks Stefano. I will write this somewhere in our docs.

One question: how do you want to proceed on my stale PR #7587 ? Do you want to review it by yourself or review it together in zoom chat (one Actions class at a time, total 3 Actions classes)?

I prefer to have someone crosscheck my code before merging it (It will merge to feature branch rucio_transfers in dmwm/CRABServer repo, but in the end, it will get merged to master). Then, I can close this issue and move on to https://github.com/dmwm/CRABServer/issues/7632 .

novicecpp commented 1 year ago

This issue is solved via this commit (just work, another patch is incoming). Tracking other Rucio ASO issues in #7632 and closing this one.