port to Rucio tapeRecall machinery

belforte commented 4 years ago

will want to pick code examples from https://github.com/dmwm/CMSRucio/blob/master/DMOps/StageDatasetForUser.py as per following mail from Eric

On 23/10/2020 16:03, Eric W Vaandering wrote:

Stefano, the code I’m using to do this is here:

https://github.com/dmwm/CMSRucio/blob/master/DMOps/StageDatasetForUser.py

I believe you wanted to see that to see so you could incorporate something similar into CRAB.

I use Rucio client directly while maybe CRAB uses the WMCore/Services class, so you may
need to make sure all the relevant calls cane made. I’ve cc’ed Alan as well.

Cheers!

(For those not on the other thread, we’re doing the first 30 TB of recalls for users).

Eric

> On Sep 17, 2020, at 5:00 AM, Stefano Belforte <stefano.belforte@cern.ch> wrote:
>
> as in the past, I believe that the things to worry about are:
> - how to deal with outlandish requests (all of B parking RAW or similar)
> - how to pick recall-to sites
> I would rather not have CRAB deal with those.
> The list of sites may surely be the "as for production", simply let's make sure
> that we do not try to have the same list written in two independent places.
> But the "what to recall" is a different beast because production output
> goes through some intelligent screening before being approved.
>
> As to implementation... well.. I have no idea of what's required.
>
> On 15/09/2020 18:54, Eric W Vaandering wrote:
>> CP, you’re making differences where they don’t really exist. The only thing that is different from what you are saying and my proposal is the quota portion which we could do away with if that’s what Ops wants to do. Otherwise it’s no different. The data is guaranteed to hang around for some period of time so the person can analyze it, after that it gets kept around if it is popular or allowed to roll off if it is not.
>> The manual request and approval I mention at the top is a temporary fallback in case the CRAB team is not able to implement this workflow before we have to shut off PhEDEx.
>>> On Sep 15, 2020, at 11:49 AM, Christoph Paus <paus@mit.edu <mailto:paus@mit.edu>> wrote:
>>>
>>> Hi Eric and all;
>>>
>>> as pointed out in the meeting this does not seem like a good solution to me. The data that people are recalling should be treated no different from data that has been produced by production, let's say 3 months ago (some cool off). I do not see a reason to involve a manual approval step or separate storage rules, because there is none. Also you are fragmenting the overall available pool of resources doing so.
>>>
>>> Going to an approval model is operationally expensive and will lead to issues. We had such models before.
>>>
>>> So, I would propose to let crab generate a rule equivalent to the one generated when a new dataset gets created. Very simple, no problem.
>>>
>>> Cheers, Christoph
>>>
>>> On Tue, Sep 15, 2020 at 11:55 AM Eric W Vaandering <ewv@fnal.gov <mailto:ewv@fnal.gov>> wrote:
>>>
>>>    Here is what I wrote some time ago about doing tape recall in CRAB. This came up again today so I want to revive the proposal. Per our discussion today, before this is automated it is possible for users to do it manually by making a rule with “ask approval” enabled which will allow a site admin to decide to host the dataset.
>>>
>>>    I had a discussion with Stefano yesterday about this trying to replicate what happens in Dynamo with Rucio. What I suggested was:
>>>
>>>     1. We make an account for CRAB tape recall and give it something like 200 TB total quota across say 20 known good analysis sites.
>>>     2. User recall request rules are made with this account and something like a 90 day lifetime
>>>     3. I think there may have to be a separate request per block which is a slight difference.
>>>     4. Requests will fail if the total space at all sites is over that amount
>>>     5. Data that becomes popular after being recalled will be kept on disk by the popularity system (as now), otherwise it is allowed to expire.
>>>
>>>    This actually seems more straightforward than I thought it would be. What do you think?
>>>
>>>
>>>
>>>    Cheers,
>>>
>>>    Eric
>>>

belforte commented 4 years ago

@ericvaandering stupid question. Rucio rule is eventually a string, correct ? I ask becasue we'll want to store it in the DB and check periodically to release user task when data is on disk (as it was done for Dynamo). But the DB column used for storing Dynamo requestId takes an integer, so I suspect we can't mix.

cmsdmwmbot commented 4 years ago

It’s a hex-string. But I doubt that helps. e6a8a421c59f455c81369a153e9488cf is one being staged now.

On Oct 23, 2020, at 9:59 AM, Stefano Belforte notifications@github.com wrote:

@ericvaandering https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ericvaandering&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=EHaoB-POFWGrYFvPXoj1bQ&m=kpEh71TYeCnmF9r00tm7bOXRLAnxuQHxyZHkEaQRPbI&s=cSBQrKXvE8rFawPdgU6l97O6BlTYQiys3UB0707l1DI&e= stupid question. Rucio rule is eventually a string, correct ? I ask becasue we'll want to store it in the DB and check periodically to release user task when data is on disk (as it was done for Dynamo). But the DB column used for storing Dynamo requestId takes an integer, so I suspect we can't mix.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dmwm_CRABServer_issues_6210-23issuecomment-2D715395291&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=EHaoB-POFWGrYFvPXoj1bQ&m=kpEh71TYeCnmF9r00tm7bOXRLAnxuQHxyZHkEaQRPbI&s=bRdeCwqcEknzSBffcR66EuUi7EImsLi-t-_86HLIV24&e=, or unsubscribe https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ADTXNNK2UUEUPSKQ6VKJ4CDSMGK5ZANCNFSM4S4VH5VA&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=EHaoB-POFWGrYFvPXoj1bQ&m=kpEh71TYeCnmF9r00tm7bOXRLAnxuQHxyZHkEaQRPbI&s=nz5hF3MvQFkPjC1dkW7oVtjGLDKazjl09zebwEaTe5k&e=.

belforte commented 4 years ago

could convert hex to base10 int, but I suspect it will be too large ! will think of something.. at worst, we have the dataset name in the DB and can query by that. Or maybe our friendly DBA can remove old data from the column and change the data type to string.

belforte commented 3 years ago

@ericvaandering @nsmith- I am starting to work on automatic tape recall via Rucio. following example in https://github.com/dmwm/CMSRucio/blob/master/DMOps/StageDatasetForUser.py If you see rule requests from belforte or from crab_server (I will stick to ask_approval=True) ignore them. We will worry about policy once code is working. I do not expect to be able to authenticate as user crab_tape_recall atm. If I find that I can't progress w/o actual requests being approved, I'll let you know.

ericvaandering commented 3 years ago

Thing is we don’t approve rules, site managers do. So maybe use a friendly site?

Sent from a mobile device.

belforte commented 3 years ago

I hadn't noticed that a site is needed. No problem for testing, I hope I have a few friends, but when doing for real I do not think CRAB is in a good position to pick destination site(s).

ericvaandering commented 3 years ago

Well, when doing it for real there will be no approval. The current thing does a scatter across all good sites and I suggest keeping that.

I assumed you were using approvals to make sure no data was recalled. In fact, a rule with approval requires just one RSE as otherwise it makes no sense. One person could approve a rule which wrote to another person’s site.

On Dec 2, 2020, at 7:36 AM, Stefano Belforte notifications@github.com wrote:

I hadn't noticed that a site is needed. No problem for testing, I hope I have a few friends, but when doing for real I do not think CRAB is in a good position to pick destination site(s).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dmwm_CRABServer_issues_6210-23issuecomment-2D737234117&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=EHaoB-POFWGrYFvPXoj1bQ&m=dWQVn1OD87QmEu7qppVfUzIpYvwxgdQpM78SbDqsptA&s=UInl8nuvZSStgYsKV7XjVJe2CL8Jvbp08An3udE4SS8&e=, or unsubscribe https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AAMYJLWDUWY6NJ35AOYUM4TSSY7FRANCNFSM4S4VH5VA&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=EHaoB-POFWGrYFvPXoj1bQ&m=dWQVn1OD87QmEu7qppVfUzIpYvwxgdQpM78SbDqsptA&s=Z_fuFd65fn06FOWt37cXF25O4FWFyWOOr0CAtb48hBs&e=.

belforte commented 3 years ago

thanks @ericvaandering indeed I figured that out when usiing your RSE_EXPRESSION and getting:

Error: Provided replication rule is considered invalid. Details: Ask approval is not allowed for rules with multiple RSEs

so I tried RSE_EXPRESSION = 'T3_IT_Trieste' and various variations taking inspirations from https://rucio.readthedocs.io/en/latest/rse_expressions.html but none worked. What shall I put there ?

Well... I can probably have fun writing rules targeted for T3_IT_Trieste, but ..what's the point ? Yes, I need some way to check all steps, w/o actually triggering tape retrieval. I need to see what one realistic rule looks like in order to figure out what to report to user and how to track it in the automatic machinery, and what excptions look like. We need to define if all requests are accepted or there should be limits (in Rucio ? in CRAB?) Maybe use a Rucio test instance to post fake requests ? First decision point: we used to pass a list of blocks to Dynamo. Do you still want that, or a full dataset name ?

belforte commented 3 years ago

Maybe I need some special quota when testing with Trieste ? I have some disk quota there, yet:

Error: There are not enough target RSEs to fulfil the request at this time.
Details: Target RSE set not sufficient for number of copies. (1 copies requested, RSE set size 0)

ericvaandering commented 3 years ago

OK, lots of small issues.

The account which we are using to script this does not need quota and that should be the account you use in the future.

Approval will not be needed in the end. Maybe you want to test with it now

RSE expression will be as in the code above, spreading the data across all Tier2s equitably.

You may need to include rse= in your expression:

-bash-4.2$ rucio list-rses --expression rse=T3_IT_Trieste
T3_IT_Trieste

belforte commented 3 years ago

thanks Eric, I sorted out the RSEExpression thing, but then I still fail to make the rule with T3_IT_Trieste, dont' know why. As you said.. many small thing, in the end I do not have a clear way to test and learn by trying and not enough detailed specification to commit code to production and turn it on. I do no know how to progress...

belforte commented 3 years ago

OK. this kind of works, I had to change some things from your example, e.g. set weigth=None I use Alan's WMCore wrapping, sort of out of politeness, but since all it does is to convert a list of names into a list of dids ... basically it saves one line. I am quite tempted of stopping using WMCore wrapper everywhere in CRAB, but that's a different topic.

# createReplicationRule(self, names, rseExpression, scope='cms', copies=1, **kwargs):
rules = rucioClient.createReplicationRule(blocks, 'T3_IT_Trieste', scope='cms', copies=1,
                                  weight=None, lifetime=DAYS, account='belforte',
                                  activity='Analysis Input', comment='Staged from tape for %s' % username,
                                  ask_approval=True, asynchronous=True,
                                           )

Which created 4 rules, one per block ! I can see the rational for 4 rules instead of one, so the question for @ericvaandering and @nsmith- is

should crab create a new container with only the needed blocks and ask for that ?
or should ask for the original dataset ? the idea is that if someone wants a couple of raw files for detector study, we could avoid recalling a full very large RAW dataset, and at same time encourage people ot ask for only what they need. In this way we could e.g. put a TB limit on each recall request, but if we only go by full dataset, it gets difficult to say no. Which is why with Yutaro we converged on passing a list of blocks https://github.com/dmwm/CRABServer/wiki/Automatic-stageout-of-tape-data

belforte commented 3 years ago

side note. I will store the rule in CRAB DB so that we can automatically check completion and then release submissions. Given that is a string like 1a31bb9828a34657a34d72258c6e5173 I will store a VARCHAR, not and INTEGER, but still need to set a maximum length. The one above is 33 chars. What more should I be ready to accept ? Can the rule be 400chars ?

nsmith- commented 3 years ago

I think its ok to create a rule per block when the whole dataset is not desired. If the whole dataset is to be recalled, maybe an optimization is to make just one rule on the container DID. The rule id is a 32 char hex string always, including the one you pasted:

>>> len("1a31bb9828a34657a34d72258c6e5173")
32

belforte commented 3 years ago

thanks @nsmith- I will try to make a container with those blocks, so there's only one rule to track. thanks for fixing my counting a LF as part of the string !

echo 1a31bb9828a34657a34d72258c6e5173|wc
      1       1      33
vs.
echo -n "1a31bb9828a34657a34d72258c6e5173"|wc
      0       1      32

nsmith- commented 3 years ago

hm I would prefer not to create new container DIDs but just making multiple rules. This is how WMCore does it, is that possible?

belforte commented 3 years ago

creating is possible. I have to find some way to track them in association to a given task. e.g. CRAB could print:

dear user, a data recall request was created for you, monitor it via
rucio list-rule 1a31bb9828a34657a34d72258c6e5173  (or whatever)
and submit again once data are on disk

but if I print a list of 40 hex strings.. few people will be happy !

Is one rule per block really an advantage ? Is that so that WMA can check as single blocks are available and start processing them ? That's too much to ask of CRAB.

belforte commented 3 years ago

I guess I can turn the existing DB column from NUMBER(38) to CLOB, rather then VARCHAR(32) and then can store the list of rules. As Oracle says

 A CLOB (character large object) value can be up to 2,147,483,647 characters long.

But I am not going to try to manage them individually.

nsmith- commented 3 years ago

My concern is that making a DID (under the user scope? definitely not cms scope) each crab job may get taxing, and the DID is thrown away afterwards. It is unique even after deletion so you have to make up a new name each time.

belforte commented 3 years ago

it is one per task, not per job. names are cheap as FKW used to say, and maybe I make them in crab_server account scope, so I do not need to create another rucio client with user credential and it gets easy to track how much we stage. What else could be a problem ? Overlapping requests from multiple users ? Conflict with existing rules ? Rules duplications ? (I see Alan has code to deal with duplicated rules... dunno why). What could go wrong ? Single rule, multple rules... OK, a few lines more of code but I can do everything.

nsmith- commented 3 years ago

Ok, under crab accout scope seems reasonable. Then we leave the container DID around? I think there is no harm in that, plus I think we need to carefully see what happens if we did delete the container DID (does it cascade? that would be very bad in this context!) What would the container DID name be? Could be nice to embed the task name. Then a user could eve rucio list-rules crab_server:/Input/task_name/USER or whatever, instead of keeping track of hex rule ID.

belforte commented 3 years ago

we need to start exploring with containers anyhow... may even be fun. Are there naming constraints ? If we need names which are "DBS compatible" I have to check what CRAB and DBS allow now. And likely need to replace ':' in current task names. If this DID will never make it to DBS (and it shouldn't) your name is great.

nsmith- commented 3 years ago

Rucio constraint is https://github.com/ericvaandering/rucio/blob/cms_nano2/lib/rucio/common/schema/cms.py#L67 copied here: r'/[a-zA-Z0-9\-_]{1,99}/[a-zA-Z0-9\.\-_]{1,199}/[A-Z\-]{1,50}' I don't think this should ever make it to DBS.

belforte commented 3 years ago

looks suspiciously similar to WMcore/Lexicon.py :-)

PRIMARY_DS = {'re': '^[a-zA-Z][a-zA-Z0-9\-_]*$', 'maxLength': 99}
PROCESSED_DS = {'re': '[a-zA-Z0-9\.\-_]+', 'maxLength': 199}
TIER = {'re': '[A-Z\-_]+', 'maxLength': 99}

other then allowing dids to start with entertaining strings like /_____-----__/ (an oversight?) And no colon ':' allowed (no idea if there's a technical reason, breaks SQL ? or it was just a whim, I wasn't part of that). well.. two dots..one dot... anything goes. Thanks.

ericvaandering commented 3 years ago

Container names follow the lexicon for CMS datasets. No accident there. Colon is probably not possible since it’s used in rucio as a delimiter between scope and name.

And we will have to make a scope for CRAB. I’d suggest “crab”. If you encode the task name in the container, that’d be helpful, probably.

Deleting a container, should we ever need to, should have no impact.

belforte commented 3 years ago

Sounds like a plan. Onward!

belforte commented 3 years ago

can't test with scope 'crab' due to https://github.com/ericvaandering/rucio/blob/0e7df0d1f489302fe011dcef28120db83ff2b2ad/lib/rucio/common/schema/cms.py#L59

Details: Problem validating did : u'crab' does not match '^(cms)|(user\\.[a-z0-9-_]{1,20})$'

But I think that I can use scope='user.crab_server' which is the account used by CRAB TaskWorkers. So far am sticking to user.belforte and rseexpression=T3_IT_Trieste as playground.

belforte commented 3 years ago

hmm Eric, Nick, what's the python equivalent of

rucio attach  user.belforte:/TapeRecall/201120_131722.belforte_crab_20201120_141717/USER cms:/MuonEG/Run2016B-v1/RAW#86bc5e3e-1519-11e6-a3f4-001e67ac06a0

? because that works (as per the twiki) and if I try to attach it again I obtain a sensible error

2020-12-07 15:54:34,195 ERROR   Data identifier already added to the destination content.
Details: [u'(cx_Oracle.IntegrityError) ORA-00001: unique constraint (CMS_RUCIO_PROD.CONTENTS_PK) violated']

But when I try from python (and I tried attach_dids, add_datasets_to_container, add_containers_to_container... all of them call attach_dids eventually [1]) I always get (both with the existing block/did or with a new one):

RucioException: An unknown exception occurred.
Details: [u'(cx_Oracle.IntegrityError) ORA-02290: check constraint (CMS_RUCIO_PROD.CONTENTS_CHILD_TYPE_NN) violated']

and cound not find a way to replicate the CLI succes. I can list container status finely from python, just to show that I have some idea of what I am doing.

Screenshot from 2020-12-07 15-59-04

[1] example Screenshot from 2020-12-07 16-02-55

Existing examples from Alan's WMCore seems to indicate that attach_dids should work.

ericvaandering commented 3 years ago

Your last example won’t work because you are not attaching a container to a container but a block (rucio dataset) to a container.

I would expect that this would work just fine with the same set of parameters:

https://rucio.readthedocs.io/en/latest/api/did.html#rucio.client.didclient.DIDClient.attach_dids https://rucio.readthedocs.io/en/latest/api/did.html#rucio.client.didclient.DIDClient.attach_dids

I’d always suggest naming the parameters on the call for clarity and to get away from ordering issues.

Also, in the 2nd example, scope is ‘cms’ not ‘cms:’

Eric

On Dec 7, 2020, at 9:15 AM, Stefano Belforte notifications@github.com wrote:

hmm Eric, Nick, what's the python equivalent of

rucio attach user.belforte:/TapeRecall/201120_131722.belforte_crab_20201120_141717/USER cms:/MuonEG/Run2016B-v1/RAW#86bc5e3e-1519-11e6-a3f4-001e67ac06a0 ? because that works (as per the twiki) and if I try to attach it again I obtain a sensible error

2020-12-07 15:54:34,195 ERROR Data identifier already added to the destination content. Details: [u'(cx_Oracle.IntegrityError) ORA-00001: unique constraint (CMS_RUCIO_PROD.CONTENTS_PK) violated'] But when I try from python (and I tried attach_dids, add_datasets_to_container, add_containers_to_container... all of them call attach_dids eventually [1]) I always get (both with the existing block/did or with a new one):

RucioException: An unknown exception occurred. Details: [u'(cx_Oracle.IntegrityError) ORA-02290: check constraint (CMS_RUCIO_PROD.CONTENTS_CHILD_TYPE_NN) violated'] and cound not find a way to replicate the CLI succes. I can list container status finely from python, just to show that I have some idea of what I am doing.

https://urldefense.proofpoint.com/v2/url?u=https-3A__user-2Dimages.githubusercontent.com_1837785_101366364-2D7d037700-2D386a-2D11eb-2D9e3e-2Ddd451cc245fd.png&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=EHaoB-POFWGrYFvPXoj1bQ&m=fQSixbBkB03o8shwVEeHmdY0pHcduV0v7T2UmqT9yJQ&s=xgxKwfIcRf5vW1lK2xvWdOYGkaQl1exc6cLMYqf0Odg&e= [1] example https://urldefense.proofpoint.com/v2/url?u=https-3A__user-2Dimages.githubusercontent.com_1837785_101366794-2D01ee9080-2D386b-2D11eb-2D91ba-2D6c049c797f94.png&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=EHaoB-POFWGrYFvPXoj1bQ&m=fQSixbBkB03o8shwVEeHmdY0pHcduV0v7T2UmqT9yJQ&s=0aSdAtOwqIp8QbZXK00AnqIvA_XA0wgbISiNjykC9EQ&e= Existing examples from Alan's WMCore seems to indicate that attach_dids should work.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dmwm_CRABServer_issues_6210-23issuecomment-2D739981153&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=EHaoB-POFWGrYFvPXoj1bQ&m=fQSixbBkB03o8shwVEeHmdY0pHcduV0v7T2UmqT9yJQ&s=Ct2OuFhqYI5BYZkXu31nd7MJXShc00YXAKWAFTRfqQI&e=, or unsubscribe https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AAMYJLWNPUZXL3T4XA7JZ53STTWSZANCNFSM4S4VH5VA&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=EHaoB-POFWGrYFvPXoj1bQ&m=fQSixbBkB03o8shwVEeHmdY0pHcduV0v7T2UmqT9yJQ&s=BwRigbokzvBqbfCqsSTk_-IExADpLzx5VBdQXDo7Rbk&e=.

belforte commented 3 years ago

that was it Eric !!!! Thanks !!!! Well spotted !!!!

Also, in the 2nd example, scope is ‘cms’ not ‘cms:’

everything else you wrote does not apply, including that AFAICT native Rucio API has names only for some of the parameters, maybe to indicate that those must be present, so requiring proper ordering... Alan followed same practive in WMCore wrappers. But I am not sure that at least in the limited CRAB use cases that we gain anything by using WMCore wrapping, other than obfuscation of the original API. A few utiity functions to massage output or prepare input according to your need are fine, but a wrapping layer that exposes some, but not all, the inner functionalities ... hmmm...

belforte commented 3 years ago

P.S. I do not find methods to delete containers.

nsmith- commented 3 years ago

This is the relevant CLI command: https://rucio.readthedocs.io/en/latest/man/rucio.html?highlight=erase#erase Looking at how it is implemented, it is with a rather cryptic metadata change:

client.set_metadata(scope=scope, name=name, key='lifetime', value=86400)

basically setting the DID lifetime to 1 day.

belforte commented 3 years ago

thanks Nick. Now.. what would be a good policy here ? is the container did only useful until the rule requesting a disk copy is created ? (i.e. the rule becomes a rule for the individual datasets(aka block) or files?) Or do we need to keep the container around until ready to let disk replicas be removed from disk ?

belforte commented 3 years ago

indeed a container has a lifetime. But the concept is not elaborated upon in the documentation. A lifetime for a rule is sort of clear. Now. if a rule affects multiple containers with multiple life times.. the actual behavior is not defined.

nsmith- commented 3 years ago

I suppose setting the DID lifetime would have to imply a lifetime is set on any rules applied to it? Did we confirm that deleting a custom container DID does not cascade delete? For CRAB tasks that plan to analyze the whole CMS dataset, there's no need to create a new containers right? Will you just use the existing DID?

belforte commented 3 years ago

Hi @nsmith- for the scope of this, I expect that we can ignore DIDs lifetime, live with the defaults and if by "summer" we have reasons to be willing to remove old containers created by CRAB (let's how many and how harmful), we can dwell in details. So far I see this mostly as a learning topic. What does it exactly mean that a DID has a finite lifetime ? I was not planning to have things like "if full dataset .. else.." and simply have a uniform code which starts from a list of blocks, fits better code that was written for Dynamo. But of course everything is possible, simply the more I change, the higher chances of introducing bugs. I am looking for a way out of CRAB maintenance rather than jobs security :-) Since at times a new container is needed, let's make sure we know how to deal with it.

belforte commented 3 years ago

the recall request submission was introduced with https://github.com/dmwm/CRABServer/pull/6322 and now being tested in https://github.com/dmwm/CRABServer/releases/tag/v3.210108 but still with preliminary, test values for Rucio account, scope, destination RSE. Tasks will be put in SUBMITFAILED and users will have to monitor the rule progress and submit again when OK.

Once finalized and we'll have rule ids in CRAB TASKDB to track progress of, I'll work on automatic task resubmission.

belforte commented 3 years ago

I think that all work on TW side has been done, besides the automatic resubmission, and is now in https://github.com/dmwm/CRABServer/releases/tag/v3.210108p2 deployed on my VM and on DEV instance of CrabServer. But while it works on stefanovm, it does not work in the server yet due to authentication issues in RUcio. Now tracked in https://github.com/dmwm/CRABServer/issues/6332

We can progress to put CRABServer REST v3.210108p2 in PreProd and Production. That is not depending on all Rucio things, it is simly to store the ruleId as a 32-char strings instead of a number (as it was the case for Dynamo)

belforte commented 3 years ago

The request submission part is done with tag v3.210110 for the automatic task release I will open a new Issue since this has got too long.

dmwm / CRABServer

port to Rucio tapeRecall machinery #6210