Closed vlimant closed 3 years ago
@amaltaro @vlimant @nsmith- @z4027163 Since a week or so, we are seeing this email dozens of times a day. There is no development work going on in phedex anymore, but something is now clearly causing this to happen very frequently.
I have not seen any such emails recently myself. Anything bad happening, is happening on phedex side c.f. https://github.com/dmwm/PHEDEX/issues/1117
Can you be more specific about what date the rate of such mails increased? Maybe we can correlate it with more load on datasvc, e.g. via rucio sync or otherwise
Sorry, couldn't find a better way to get you this information other than:
https://www.dropbox.com/sh/c90bs1830rm5vlg/AABzw7p2cNBc7Z050IIymSkXa?dl=0
@vlimant Yes, the bug is on the PhEDEx side, and it won't be fixed :(, but maybe other services need to know about this right now. In case they see something strange, it might be because of the total nonsense phedex response.
Phedex is not used anymore, closing this issue.
For the record on a mystery failure of Phedex, that had impacted unified in the past, that has a protection which @nsmith- and @dr-stringfellow are wondering about.
subscribor
is try to get all unassigned blockshttps://github.com/CMSCompOps/WmAgentScripts/blob/master/Unified/subscribor.py#L56
and make a subscription to DataOps (without an actual transfer) just so that it belongs and is counted against dataops quota : in an effort to "fix" https://github.com/dmwm/WMCore/issues/5945 (in the wrong place obviously)
getDatasetBlockAndSite
has been failing very badly rarely https://github.com/dmwm/PHEDEX/issues/1117, providing things completely irrelevant to the initial query to phedex (@nataliaratnikova)this https://github.com/CMSCompOps/WmAgentScripts/commit/0522c2e17c6613e23df6079a576993e4ef1e288a and the email https://github.com/CMSCompOps/WmAgentScripts/commit/eb4687a408e33bc60935eb62c8f4f1aa591441b7 was put in place so that things don't go astray in unified and downstream.
a band-aid on a band-aid, on a band-aid ...