dmwm / CRABServer

15 stars 37 forks source link

add a per-user policy to tape recall #8354

Open belforte opened 2 months ago

belforte commented 2 months ago

last week two users managed to put tape recall requests for a total of about 7PB

We can't deliver that "timely" nor can they process all that data. Clearly they do not know what they are doing.

We want to put a limit on how much each individual user can request at any given time, i.e. until all of her/his previous requests are satisfied.

We had an initial discussion in DM Ops MatterMost, moving here to converge on a concrete proposal and implement it.

@dynamic-entropy

belforte commented 2 months ago

Rahul, currently each rule that we create has a comment like this

Comment:                    Recall 605 GBytes for user: sleontsi dataset: /Charmonium/Run2016E-21Feb2020_UL2016_HIPM-v1/MINIAOD

Format is fixed becasue we use it for our monitoring

I am open to change to (or add as) a rule metadata.

I think that what we need is a small database of how much a given user has already in recall in order to make a decision whether to accept a request or not.

Can you summarize how you do this for automatic recall ?

dynamic-entropy commented 1 month ago

Hello, Stefano I saw your mail. Did they just repeat the jobs that triggered the recalls? Does this need to be taken urgently? I wanted to do some tests on my own to pitch a concrete proposal before saying anything, hence the delay.

belforte commented 1 month ago

Simply I had not done anything until now. I think he will be careful. But we do not want to continuously watch things.

belforte commented 1 month ago

On Hold, waiting for a proposal from @dynamic-entropy on how to track usage by user. As a reminder currently limits to each request are applied here https://github.com/dmwm/CRABServer/blob/5ca7cae6d4ebe374f945eb46a7768a64c2b24fa2/src/python/TaskWorker/Actions/DBSDataDiscovery.py#L474

And currently those are in English:

Beyond that the old plan was to have a global quota cap on crab_tape_recall account. Couple of years ago Nick told me that this cap was there but for technical reasons was not implemented in Rucio. I do not know any details on that.

belforte commented 1 month ago

Stefano and Rahul had a chat. Proposal:

we prefer to do the check in CRAB code because Rucio does not have a way (yet) to communicate to the user what the usage is etc. Only a yes/no.

If computing the total size for a user takes too long, we will reconsider

belforte commented 1 month ago

implementation tasks in https://github.com/dmwm/CRABServer/blob/master/src/python/TaskWorker/Actions/RucioActions.py

belforte commented 1 month ago

@dynamic-entropy a first implementation is ready in https://github.com/belforte/CRABServer/tree/add-user-policy-to-tape-recall-8354

currently it uses crab_tape_recall account and Analysis Input activity with AskForApproval=False. Waiting for Analysis Tape Recall activity to be accepted by Rucio with automatic approval. So that I can verify that nothing is broken.

if you care to have a look, relevant code changes are below. Any comment is always welcome

https://github.com/belforte/CRABServer/blob/72c66a157e1263637108afa0ac5fbfcf5467cf06/src/python/RucioUtils.py#L98-L120

https://github.com/belforte/CRABServer/blob/72c66a157e1263637108afa0ac5fbfcf5467cf06/src/python/TaskWorker/Actions/RucioActions.py#L62-L85

https://github.com/belforte/CRABServer/blob/72c66a157e1263637108afa0ac5fbfcf5467cf06/src/python/TaskWorker/Actions/DBSDataDiscovery.py#L529-L550

belforte commented 1 month ago

on hold. wating for at least the new activity to be allowed on Rucio side. @dynamic-entropy I think that's safe enough that you can put it in production w/o me testing on integration, right ?

Then we can put in production and test still using account crab_tape_recall and askforapproval=False and switch those once code is tested and the permission part in Rucio is also true.

dynamic-entropy commented 1 month ago

Hello Stefano I have created a PR to add the activity to the schema as well as check for the crab_tape_recall account, after which the rule shall created. The approve part will need yet another PR. But now you can create rules without askApproval. https://github.com/dmwm/CMSRucio/pull/802

belforte commented 1 month ago

thank you Rahul. Kindly let me know when it is deployed so I can try it and verify that I correctly compute the used quota

dynamic-entropy commented 1 month ago

Sure, I have added commits for auto-approve too. I will let you know when it's deployed. Cheers

dynamic-entropy commented 1 week ago

Hello @belforte It's in production now. Apologies, for the delay. Kindly test if everything is okay and let me know if more things need plumbing. Cheers

belforte commented 1 week ago

many thanks @dynamic-entropy I am awfully busy atm with VOMS-to-IAM transition and removing deprecated HTCondor calls from CRAB. But this is top of the list after that. Hopefully sometimes next week.

belforte commented 3 days ago

Hi @dynamic-entropy I finally got to test and have stumbled on a permission issue with this code https://github.com/belforte/CRABServer/blob/1b400d1d3a739024960c2c0980e349b53d66d28f/src/python/TaskWorker/Actions/RucioActions.py#L79-L84

            ruleIds = self.rucioClient.add_replication_rule(  # N.B. returns a list
                dids=[did], copies=copies, rse_expression=rseExpression,
                grouping=grouping, weight=weight, lifetime=lifetime,
                account=account, activity=activity,
                comment=comment,
                ask_approval=askApproval, asynchronous=True)

it appears that no matter what I put as account, Rucio will try to create the use using the user of the current client and I get

Rucio exception creating rule: Access to the requested resource denied.
Details: Account crab_tape_recall can not add replication rule

I have askApprova=True and activity="Analysis TapeRecall"

Any idea ? Your PR has the same crab_tape_recall username so it does not look a simple typo.

belforte commented 3 days ago

maybe scope matters ? This is a test where I have this did '{scope': 'user.crab_tape_recall', 'name': '/TapeRecall/pippo1/USER'}

dynamic-entropy commented 7 hours ago

Hello @belforte

I am unable to reproduce this.

rucio add-rule user.crab_tape_recall:/TapeRecall/pippo1/USER 1 T2_RC_MOCK --lifetime 86400 --activity "Analysis TapeRecall" --ask-approval --account rchauhan 

d135f23c65764f9f9bf0b5a3600ecae3

https://cms-rucio-webui.cern.ch/rule?rule_id=d135f23c65764f9f9bf0b5a3600ecae3#locks

Can you please provide me with the values of those variables in the test?

belforte commented 1 hour ago

thanks Rahul. I am holiday this week, but will do more tests (with more details) ASAP

belforte commented 1 hour ago

from which Rucio account did you issue the rucio add-rule ?

dynamic-entropy commented 1 hour ago
rucio whoami | grep account
account    : crab_tape_recall
dynamic-entropy commented 1 hour ago

Just created another rule: https://cms-rucio-webui.cern.ch/rule?rule_id=7e35f6942243492c890633034dcd47ec