CMSCompOps / WmAgentScripts

CMS Workflow Team Scripts
7 stars 51 forks source link

email "phedex is acting up" #510

Closed vlimant closed 3 years ago

vlimant commented 4 years ago

For the record on a mystery failure of Phedex, that had impacted unified in the past, that has a protection which @nsmith- and @dr-stringfellow are wondering about.

subscribor is try to get all unassigned blocks

https://github.com/CMSCompOps/WmAgentScripts/blob/master/Unified/subscribor.py#L56

and make a subscription to DataOps (without an actual transfer) just so that it belongs and is counted against dataops quota : in an effort to "fix" https://github.com/dmwm/WMCore/issues/5945 (in the wrong place obviously)

getDatasetBlockAndSite has been failing very badly rarely https://github.com/dmwm/PHEDEX/issues/1117, providing things completely irrelevant to the initial query to phedex (@nataliaratnikova)

this https://github.com/CMSCompOps/WmAgentScripts/commit/0522c2e17c6613e23df6079a576993e4ef1e288a and the email https://github.com/CMSCompOps/WmAgentScripts/commit/eb4687a408e33bc60935eb62c8f4f1aa591441b7 was put in place so that things don't go astray in unified and downstream.

a band-aid on a band-aid, on a band-aid ...

dr-stringfellow commented 4 years ago

@amaltaro @vlimant @nsmith- @z4027163 Since a week or so, we are seeing this email dozens of times a day. There is no development work going on in phedex anymore, but something is now clearly causing this to happen very frequently.

vlimant commented 4 years ago

I have not seen any such emails recently myself. Anything bad happening, is happening on phedex side c.f. https://github.com/dmwm/PHEDEX/issues/1117

nsmith- commented 4 years ago

Can you be more specific about what date the rate of such mails increased? Maybe we can correlate it with more load on datasvc, e.g. via rucio sync or otherwise

dr-stringfellow commented 4 years ago

Sorry, couldn't find a better way to get you this information other than:

https://www.dropbox.com/sh/c90bs1830rm5vlg/AABzw7p2cNBc7Z050IIymSkXa?dl=0

@vlimant Yes, the bug is on the PhEDEx side, and it won't be fixed :(, but maybe other services need to know about this right now. In case they see something strange, it might be because of the total nonsense phedex response.

haozturk commented 3 years ago

Phedex is not used anymore, closing this issue.