dmwm / PHEDEX

CMS data-placement suite
8 stars 18 forks source link

Submit changes button often failing #1097

Open vlimant opened 6 years ago

vlimant commented 6 years ago

when trying to approve anything over multiple sites, the "submit" often (like 99%) returns with an error. I think it is time to fix this

nataliaratnikova commented 6 years ago

Could you provide information:

If this is a load related error, we need to find out where is the bottleneck.

vlimant commented 6 years ago

you can try to create a transfer of any dataset to 15 sites, and disapprove all of them at once. This will fail, I guarantee it

vlimant commented 6 years ago

I just tried to approve all pending for https://cmsweb.cern.ch/phedex/prod/Request::View?request=1097871

and got

""" Apologies, looks like we have an internal server error, details of which below. If the problem persists, please submit a bug report.

Error time=2017-09-11 10:23:56 UTC id=4525e0d67b4c2004d6ce584fddf59b78 """

several times

nataliaratnikova commented 6 years ago

Hi, thanks for the timestamp, it was helpful. I don't have approval rights and would not be able to try myself.

Anyway, I tracked this down to this error in the server log: 2017-09-11 10:23:56 UTC: error: id=4525e0d67b4c2004d6ce584fddf59b78 Error evaluating client identity at /data/srv/beHG1707d/sw/slc7_amd64_gcc630/cms/PHEDEX-datasvc/2.3.24/perl_lib/PHEDEX/Web/API/UpdateRequest.pm line 116.\n at /data/srv/state/phedex/htdocs/WebSite/access25 line 4356.

Investigating further I found this to be a sort of safety feature introduced 5 years ago: https://github.com/dmwm/PHEDEX/commit/80fee7f98154ead19c4c55a2edabe8151d04a208 which prevents ( on the authentication level) the approval of more than 10 node requests at a time.

In principle this makes sense, because once approved, it is not possible to undo the deletions!

Since the request in question is fully approved now, I assume you eventually succeeded after trying a few times. If this is the case (please confirm) , I see several options: A) leave this feature as a precaution, but document it and produce a meaningful error. B) Increase the limit to whatever seems practical to you C) disable the feature

vlimant commented 6 years ago

would be great to understand why this was introduced. experiementally the limit is not at 10, but 2-3ish. this is very unpractical