Open haozturk opened 2 hours ago
The solution here is to get the "always deep" patch, apply it, and get rid of the table and the jobs that produce it.
Okay, we can try it out next week?
I don't know if we can just delete the tables after this. When rucio tries to update such tables, they'd crush and I don't know how they handle such exceptions. That's why I suggested a cron job which wipes out these tables regularly.
I did it now.
The job COLL_REPL_UPDATED_JOB_CMS runs COLL_REPLICAS_UPDATE_ALL
That job was stopped and disabled. Rucio was patched to always use --deep (very simple patch).
I did not delete the table. Rucio itself only tries to read this table from what I know, not update them.
Bug Description
Problem described in this ticket by Panos: https://its.cern.ch/jira/browse/CMSDM-210 . I'm creating this issue so that we can include it in our Q4 planning and sort out a solution.
Reproduction Steps
No response
Expected Behavior
No response
Possible Solution
Firstly, do we really need this table and the
COLLECTION_REPLICAS
table? Islist-dataset-replicas
using this table at the moment or the replicas table? If the former, I remember from the rucio workshop ATLAS mentioning that they're running a patch which makes this method use the expensive--deep
flag by default and they didn't observe any problem. If that's the case, I think we can consider this option too.If we eventually decide that we need this table long term, then we need to come up with a way to handle it. I heard Yuyi has done some work to partition it which was not deployed in production [1]
If we'll eventually get rid of it, then we need a procedure to handle this table until we get rid of it. If we make
-deep
default, I reckon we can create a SQL procedure which will wipe out theUPDATED_COL_REP
andCOLLECTION_REPLICAS
regularly. O/w, we should run another procedure that wipes out the UPDATED_COL_REP table and refills theCOLLECTION_REPLICAS
using the replicas table.If eventually rucio decides to drop this table, then we would get rid of this problem completely.
@ericvaandering FYI
[1] https://github.com/yuyiguo/rucio/pull/7/files#diff-6db4929cf5c1d099d8d38edb8fc68e9a4cb70a3fa466b61c238b6f54f6eeefc9
Related Issues
https://github.com/dmwm/CMSRucio/issues/257