Open belforte opened 2 months ago
Rahul, currently each rule that we create has a comment like this
Comment: Recall 605 GBytes for user: sleontsi dataset: /Charmonium/Run2016E-21Feb2020_UL2016_HIPM-v1/MINIAOD
Format is fixed becasue we use it for our monitoring
I am open to change to (or add as) a rule metadata.
I think that what we need is a small database of how much a given user has already in recall in order to make a decision whether to accept a request or not.
Can you summarize how you do this for automatic recall ?
Hello, Stefano I saw your mail. Did they just repeat the jobs that triggered the recalls? Does this need to be taken urgently? I wanted to do some tests on my own to pitch a concrete proposal before saying anything, hence the delay.
Simply I had not done anything until now. I think he will be careful. But we do not want to continuously watch things.
On Hold, waiting for a proposal from @dynamic-entropy on how to track usage by user. As a reminder currently limits to each request are applied here https://github.com/dmwm/CRABServer/blob/5ca7cae6d4ebe374f945eb46a7768a64c2b24fa2/src/python/TaskWorker/Actions/DBSDataDiscovery.py#L474
And currently those are in English:
['AOD', 'AODSIM', 'MINIAOD', 'MINIAODSIM', 'NANOAOD', 'NANOAODSIM']
are recalled no matter how bigBeyond that the old plan was to have a global quota cap on crab_tape_recall
account. Couple of years ago Nick told me that this cap was there but for technical reasons was not implemented in Rucio. I do not know any details on that.
Stefano and Rahul had a chat. Proposal:
account=username
and activity="Analysis Tape Recall"
and askApproval=True
crab_tape_recall
account can create rules in this activity and they will be automatically approved. This will avoid that the rule is charged to the user Rucio quota on the selected RSEwe prefer to do the check in CRAB code because Rucio does not have a way (yet) to communicate to the user what the usage is etc. Only a yes/no.
If computing the total size for a user takes too long, we will reconsider
implementation tasks in https://github.com/dmwm/CRABServer/blob/master/src/python/TaskWorker/Actions/RucioActions.py
maxTierToBlockRecallSize
? i.e. in https://gitlab.cern.ch/ai/it-puppet-hostgroup-vocmsglidein/-/blob/master/code/templates/crabtaskworker/taskworker/TaskWorkerConfig.py.erb#L107-108activity
as arguments to createOrReuseRucioRule
. Account can be selected inside there since self.username
is available@dynamic-entropy a first implementation is ready in https://github.com/belforte/CRABServer/tree/add-user-policy-to-tape-recall-8354
currently it uses crab_tape_recall
account and Analysis Input
activity with AskForApproval=False
. Waiting for Analysis Tape Recall
activity to be accepted by Rucio with automatic approval. So that I can verify that nothing is broken.
if you care to have a look, relevant code changes are below. Any comment is always welcome
on hold. wating for at least the new activity to be allowed on Rucio side. @dynamic-entropy I think that's safe enough that you can put it in production w/o me testing on integration, right ?
Then we can put in production and test still using account crab_tape_recall
and askforapproval=False
and switch those once code is tested and the permission part in Rucio is also true.
Hello Stefano
I have created a PR to add the activity to the schema as well as check for the crab_tape_recall account, after which the rule shall created.
The approve
part will need yet another PR. But now you can create rules without askApproval.
https://github.com/dmwm/CMSRucio/pull/802
thank you Rahul. Kindly let me know when it is deployed so I can try it and verify that I correctly compute the used quota
Sure, I have added commits for auto-approve
too. I will let you know when it's deployed.
Cheers
Hello @belforte It's in production now. Apologies, for the delay. Kindly test if everything is okay and let me know if more things need plumbing. Cheers
many thanks @dynamic-entropy I am awfully busy atm with VOMS-to-IAM transition and removing deprecated HTCondor calls from CRAB. But this is top of the list after that. Hopefully sometimes next week.
Hi @dynamic-entropy I finally got to test and have stumbled on a permission issue with this code https://github.com/belforte/CRABServer/blob/1b400d1d3a739024960c2c0980e349b53d66d28f/src/python/TaskWorker/Actions/RucioActions.py#L79-L84
ruleIds = self.rucioClient.add_replication_rule( # N.B. returns a list
dids=[did], copies=copies, rse_expression=rseExpression,
grouping=grouping, weight=weight, lifetime=lifetime,
account=account, activity=activity,
comment=comment,
ask_approval=askApproval, asynchronous=True)
it appears that no matter what I put as account
, Rucio will try to create the use using the user of the current client and I get
Rucio exception creating rule: Access to the requested resource denied.
Details: Account crab_tape_recall can not add replication rule
I have askApprova=True
and activity="Analysis TapeRecall"
Any idea ? Your PR has the same crab_tape_recall
username so it does not look a simple typo.
maybe scope matters ?
This is a test where I have this did '{scope': 'user.crab_tape_recall', 'name': '/TapeRecall/pippo1/USER'}
Hello @belforte
I am unable to reproduce this.
rucio add-rule user.crab_tape_recall:/TapeRecall/pippo1/USER 1 T2_RC_MOCK --lifetime 86400 --activity "Analysis TapeRecall" --ask-approval --account rchauhan
d135f23c65764f9f9bf0b5a3600ecae3
https://cms-rucio-webui.cern.ch/rule?rule_id=d135f23c65764f9f9bf0b5a3600ecae3#locks
Can you please provide me with the values of those variables in the test?
thanks Rahul. I am holiday this week, but will do more tests (with more details) ASAP
from which Rucio account did you issue the rucio add-rule
?
rucio whoami | grep account
account : crab_tape_recall
Just created another rule: https://cms-rucio-webui.cern.ch/rule?rule_id=7e35f6942243492c890633034dcd47ec
last week two users managed to put tape recall requests for a total of about 7PB
We can't deliver that "timely" nor can they process all that data. Clearly they do not know what they are doing.
We want to put a limit on how much each individual user can request at any given time, i.e. until all of her/his previous requests are satisfied.
We had an initial discussion in DM Ops MatterMost, moving here to converge on a concrete proposal and implement it.
@dynamic-entropy