Open DAMason opened 10 years ago
Hi Dave,
I'll take a look.
Cheers, Tony.
Thanks -- will leave the requests alone for now -- though would be nice to clean them up at some point when you no longer need them for debugging...
FWIW we have another request like this:
Request #410835
The error I got this last time trying to approve:
Apologies, looks like we have an internal server error, details of which below. If the problem persists, please submit a bug report.
Error time=2014-03-25 03:57:07 UTC id=ed1af18271b345447c087fc949602b6b
This and the other two referenced here are kinda left hanging -- what should be done with them?
Thanks!
--Dave
Hi Dave,
sorry for the delay on this, I've had no time at all to look into it. I hope to get to it by the end of this week.
Cheers, Tony.
On 03/25/2014 05:00 AM, DAMason wrote:
FWIW we have another request like this:
Request #410835
The error I got this last time trying to approve:
Apologies, looks like we have an internal server error, details of which below. If the problem persists, please submit a bug report.
Error time=2014-03-25 03:57:07 UTC id=ed1af18271b345447c087fc949602b6b
This and the other two referenced here are kinda left hanging -- what should be done with them?
Thanks!
--Dave
— Reply to this email directly or view it on GitHub https://github.com/dmwm/PHEDEX/issues/954#issuecomment-38528876.
OK -- seems we have another one -- in fact now about 4 of these guys stacked up at FNAL, the latest I just tried to approve again to give you a recent timestamp:
""" Apologies, looks like we have an internal server error, details of which below. If the problem persists, please submit a bug report.
Error time=2014-04-12 14:41:07 UTC id=ed1af18271b345447c087fc949602b6b
This is from request 412473 """
Apparently whats going on is ops are seeing that the agent doesn't have a record of a subscription being made for some datasets, so then manually go make the custodial subscription themselves. Currently the (FNAL) subscription requests I have in this state are the following:
407424 407431 410835 412473
Would be nice to at least know what can be done with them -- easiest is to just disapprove, but am leaving them around so that you might know what's going wonky here :)
Thanks!
Hi Dave,
so, these are all indeed duplicate requests:
cannot request replica transfer: /MuMinus_Pt-1to150_PositiveEndcap-gun/Fall13-POSTLS162_V1-v4/GEN-SIM already subscribed to T1_US_FNAL_MSS as move
cannot request replica transfer: /WprimeToENu_M_3800_Tune4C_13TeV_pythia8/Fall13-POSTLS162_V1-v1/GEN-SIM already subscribed to T1_US_FNAL_MSS as move
cannot request replica transfer: /QCD_Pt-120to170_MuEnrichedPt5_Tune4C_13TeV_pythia8/Fall13dr-tsg_PU20bx25_POSTLS162_V2-v1/AODSIM already subscribed to T1_US_FNAL_MSS as move
/TZJetsTo3LNuB_FCNC_zeta_zut_8TeV_madgraph/Summer12_DR53X-PU_S10_START53_V19-v1/AODSIM already subscribed to T1_US_FNAL_MSS with different custodiality
you should go ahead and disapprove them.
From my side, I need to examine the UpdateRequests API which is giving this error message. The API traps all errors and reports this generic error instead of the details, because it doesn't fully trust that the errors won't leak sensitive information. I can filter the useful error messages and just pass them on to the user.
So I've updated the title of this issue and will leave it open until it's fixed, hopefully in the first release after Easter.
Cheers, Tony.
Hi Tony,
Thanks — yes passing a more instructive error message to the requestor would be the best thing here.
Thanks!
—Dave
On Apr 14, 2014, at 6:09 AM, Tony Wildish notifications@github.com<mailto:notifications@github.com> wrote:
Hi Dave,
so, these are all indeed duplicate requests:
cannot request replica transfer: /MuMinus_Pt-1to150_PositiveEndcap-gun/Fall13-POSTLS162_V1-v4/GEN-SIM already subscribed to T1_US_FNAL_MSS as move
cannot request replica transfer: /WprimeToENu_M_3800_Tune4C_13TeV_pythia8/Fall13-POSTLS162_V1-v1/GEN-SIM already subscribed to T1_US_FNAL_MSS as move
cannot request replica transfer: /QCD_Pt-120to170_MuEnrichedPt5_Tune4C_13TeV_pythia8/Fall13dr-tsg_PU20bx25_POSTLS162_V2-v1/AODSIM already subscribed to T1_US_FNAL_MSS as move
/TZJetsTo3LNuB_FCNC_zeta_zut_8TeV_madgraph/Summer12_DR53X-PU_S10_START53_V19-v1/AODSIM already subscribed to T1_US_FNAL_MSS with different custodiality
you should go ahead and disapprove them.
From my side, I need to examine the UpdateRequests API which is giving this error message. The API traps all errors and reports this generic error instead of the details, because it doesn't fully trust that the errors won't leak sensitive information. I can filter the useful error messages and just pass them on to the user.
So I've updated the title of this issue and will leave it open until it's fixed, hopefully in the first release after Easter.
Cheers, Tony.
— Reply to this email directly or view it on GitHubhttps://github.com/dmwm/PHEDEX/issues/954#issuecomment-40354786.
Greetings,
When I try to approve transfer requests 407431 and 407424 I get an error like:
""" Apologies, looks like we have an internal server error, details of which below. If the problem persists, please submit a bug report.
Error time=2013-12-14 17:26:36 UTC id=306eb01962ae825b712c8ab74db0a4fe
"""
Other requests that have come before and after these were fine. These seem to have been manually created by Julian -- in the comments I see:
This subscription need to be manually created due to failures in WMAgent. They belong to the following workflows pdmvserv_EXO-Fall13-00106_00026_v0131206_200618_2283 pdmvserv_EXO-Fall13-00120_00026_v0__131206_200622_2530 pdmvserv_EXO-Fall13-00130_00026_v0131206_200822_6140
Julian later reported that after he made these requests the agent recovered and made the subscriptions itself. These then became duplicates.
Thanks,
--Dave