Open belforte opened 2 years ago
the |tee /dev/stderr
was added in https://github.com/ddaina/CRABClient/commit/80688c5b2a433047b57bc9fda19a3ab38639dafc in order to solve
https://github.com/dmwm/CRABClient/issues/5067
It was a tricky point. And I am afraid we need to revisit. @mapellidario do you feel like helping ?
rats !! this happens also on tasks on production server using production client like https://cmsweb.cern.ch/crabserver/ui/task/220317_212302%3Amstojano_crab_K892_pT30 where I get after crab remake :
belforte@lxplus785/bug> crab --version
CRAB client v3.220210
belforte@lxplus785/bug> crab status crab_K892_pT30/
Fatal error trying to connect to https://cmsweb.cern.ch:8443/crabserver/prod/task?subresource=search&workflow=220317_212302%3Amstojano_crab_K892_pT30 using subresource=search&workflow=220317_212302%3Amstojano_crab_K892_pT30.
The server answered with an error.
belforte@lxplus785/bug>
increasing priority :-(
I am of course pretty sure that in the past we had "sane outputs from crab status" also when DBS was returning HTTP 502. So is this maybe really due to change in TW ? Surely changing TW is a much faster fix than deploying new cllient.
hmm... the cmsbot example above was from crab-dev-tw01 where message is truncated to 7500 char. Also the truncate message is in any case added to the message, would not override the reporting of HTTP 502 at the beginning which is the problem here. I am puzzled.
if I remove the |tee /dev/stderr
I get:
belforte@lxplus785/bot> crab status
CRAB project directory: /afs/cern.ch/work/b/belforte/CRAB3/TC3/dbg/bug/crab_K892_pT30
Task name: 220317_212302:mstojano_crab_K892_pT30
Grid scheduler - Task Worker: N/A yet - crab-prod-tw01
Status on the CRAB server: SUBMITFAILED
Task URL to use for HELP: https://cmsweb.cern.ch/crabserver/ui/task/220317_212302%3Amstojano_crab_K892_pT30
Dashboard monitoring URL: https://monit-grafana.cern.ch/d/cmsTMDetail/cms-task-monitoring-task-view?orgId=11&var-user=mstojano&var-task=220317_212302%3Amstojano_crab_K892_pT30&from=1647548582000&to=now
Failure message from server: Problem handling 220317_212302:mstojano_crab_K892_pT30 because of DBSReaderError
Message: Can't contact DBS at https://cmsweb-prod.cern.ch/dbs/prod/global/DBSReader, got errors Instantiating DBS3Reader failed with HTTP Error 502:
URL=https://cmsweb-prod.cern.ch:8443/dbs/prod/global/DBSReader/serverinfo
Code=502
Message=Proxy Error
Header=HTTP/1.1 502 Proxy Error
Date: Thu, 17 Mar 2022 21:23:35 GMT
Server: Apache
Content-Length: 497
Content-Type: text/html; charset=iso-8859-1
Body=<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>502 Proxy Error</title>
</head><body>
<h1>Proxy Error</h1>
<p>The proxy server received an invalid
response from an upstream server.<br />
The proxy server could not handle the request <em><a href="/auth/complete/dbs/prod/global/DBSReader/serverinfo">GET /auth/complete/dbs/prod/global/DBSReader/serverinfo</a></em>.<p>
Reason: <strong>DNS lookup failure for: cmsweb-k8s-prodsrv.cern.ch</strong></p></p>
</body></html>
ClassName : None
ModuleName : WMCore.Services.DBS.DBSErrors
MethodName : __init__
ClassInstance : None
FileName : /data/srv/TaskManager/v3.220314patch1-3391c4ccadfae468fd63c78475580c2d/slc7_amd64_gcc630/cms/crabtaskworker/v3.220314patch1-3391c4ccadfae468fd63c78475580c2d/lib/python3.8/site-packages/WMCore/Services/DBS/DBSErrors.py
LineNumber : 38
ErrorNr : 1002
failure, traceback follows
Traceback (most recent call last):
File "/data/srv/TaskManager/v3.220314patch1-3391c4ccadfae468fd63c78475580c2d/slc7_amd64_gcc630/cms/py3-dbs3-client/4.0.7/lib/python3.8/site-packages/dbs/apis/dbsClient.py", line 445, in __callServer
self.http_response = method_func(self.url, method, params, data, request_headers)
File "/data/srv/TaskManager/v3.220314patch1-3391c4ccadfae468fd63c78475580c2d/slc7_amd64_gcc630/cms/py3-dbs3-pycurl/3.17.7-comp/lib/python3.8/site-packages/RestClient/RestApi.py", line 36, in get
return http_request(self._curl)
File "/data/srv/TaskManager/v3.220314patch1-3391c4ccadfae468fd63c78475580c2d/slc7_amd64_gcc630/cms/py3-dbs3-pycurl/3.17.7-comp/lib/python3.8/site-packages/RestClient/RequestHandling/HTTPRequest.py", line 62, in __call__
raise HTTPError(effective_url, http_code, http_response.msg, http_response.raw_header, http_response.body)
RestClient.ErrorHandling.RestClientExceptions.HTTPError: HTTP Error 502: Proxy Error
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/data/srv/TaskManager/v3.220314patch1-3391c4ccadfae468fd63c78475580c2d/slc7_amd64_gcc630/cms/crabtaskworker/v3.220314patch1-3391c4ccadfae468fd63c78475580c2d/lib/python3.8/site-packages/WMCore/Services/DBS/DBSReader.py", line 19, in DBSReader
dbs.dbs.serverinfo()
File "/data/srv/TaskManager/v3.220314patch1-3391c4ccadfae468fd63c78475580c2d/slc7_amd64_gcc630/cms/py3-dbs3-client/4.0.7/lib/python3.8/site-packages/dbs/apis/dbsClient.py", line 1838, in serverinfo
return self.__callServer("serverinfo")
File "/data/srv/TaskManager/v3.220314patch1-3391c4ccadfae468fd63c78475580c2d/slc7_amd64_gcc630/cms/py3-dbs3-client/4.0.7/lib/python3.8/site-packages/dbs/apis/dbsClient.py", line 457, in __callServer
self.__parseForException(data)
File "/data/srv/TaskManager/v3.220314patch1-3391c4ccadfae468fd63c78475580c2d/slc7_amd64_gcc630/cms/py3-dbs3-client/4.0.7/lib/python3.8/site-packages/dbs/apis/dbsClient.py", line 483, in __parseForException
raise http_error
RestClient.ErrorHandling.RestClientExceptions.HTTPError: HTTP Error 502:
URL=https://cmsweb-prod.cern.ch:8443/dbs/prod/global/DBSReader/serverinfo
Code=502
Message=Proxy Error
Header=HTTP/1.1 502 Proxy Error
Date: Thu, 17 Mar 2022 21:23:35 GMT
Server: Apache
Content-Length: 497
Content-Type: text/html; charset=iso-8859-1
Body=<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>502 Proxy Error</title>
</head><body>
<h1>Proxy Error</h1>
<p>The proxy server received an invalid
response from an upstream server.<br />
The proxy server could not handle the request <em><a href="/auth/complete/dbs/prod/global/DBSReader/serverinfo">GET /auth/complete/dbs/prod/global/DBSReader/serverinfo</a></em>.<p>
Reason: <strong>DNS lookup failure for: cmsweb-k8s-prodsrv.cern.ch</strong></p></p>
</body></html>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/data/srv/TaskManager/v3.220314patch1-3391c4ccadfae468fd63c78475580c2d/slc7_amd64_gcc630/cms/crabtaskworker/v3.220314patch1-3391c4ccadfae468fd63c78475580c2d/lib/python3.8/site-packages/TaskWorker/Actions/Handler.py", line 80, in executeAction
output = work.execute(nextinput, task=self._task, tempDir=self.tempDir)
File "/data/srv/TaskManager/v3.220314patch1-3391c4ccadfae468fd63c78475580c2d/slc7_amd64_gcc630/cms/crabtaskworker/v3.220314patch1-3391c4ccadfae468fd63c78475580c2d/lib/python3.8/site-packages/TaskWorker/Actions/DBSDataDiscovery.py", line 243, in execute
result = self.executeInternal(*args, **kwargs)
File "/data/srv/TaskManager/v3.220314patch1-3391c4ccadfae468fd63c78475580c2d/slc7_amd64_gcc630/cms/crabtaskworker/v3.220314patch1-3391c4ccadfae468fd63c78475580c2d/lib/python3.8/site-packages/TaskWorker/Actions/DBSDataDiscovery.py", line 261, in executeInternal
self.dbs = DBSReader(dbsurl)
File "/data/srv/TaskManager/v3.220314patch1-3391c4ccadfae468fd63c78475580c2d/slc7_amd64_gcc630/cms/crabtaskworker/v3.220314patch1-3391c4ccadfae468fd63c78475580c2d/lib/python3.8/site-packages/WMCore/Services/DBS/DBSReader.py", line 23, in DBSReader
raise DBSReaderError("Can't contact DBS at %s, got errors %s" % (endpoint, msg))
WMCore.Services.DBS.DBSErrors.DBSReaderError: DBSReaderError
Message: Can't contact DBS at https://cmsweb-prod.cern.ch/dbs/prod/global/DBSReader, got errors Instantiating DBS3Reader failed with HTTP Error 502:
URL=https://cmsweb-prod.cern.ch:8443/dbs/prod/global/DBSReader/serverinfo
Code=502
Message=Proxy Error
Header=HTTP/1.1 502 Proxy Error
Date: Thu, 17 Mar 2022 21:23:35 GMT
Server: Apache
Content-Length: 497
Content-Type: text/html; charset=iso-8859-1
Body=<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>502 Proxy Error</title>
</head><body>
<h1>Proxy Error</h1>
<p>The proxy server received an invalid
response from an upstream server.<br />
The proxy server could not handle the request <em><a href="/auth/complete/dbs/prod/global/DBSReader/serverinfo">GET /auth/complete/dbs/prod/global/DBSReader/serverinfo</a></em>.<p>
Reason: <strong>DNS lookup failure for: cmsweb-k8s-prodsrv.cern.ch</strong></p></p>
</body></html>
ClassName : None
ModuleName : WMCore.Services.DBS.DBSErrors
MethodName : __init__
ClassInstance : None
FileName : /data/srv/TaskManager/v3.220314patch1-3391c4ccadfae468fd63c78475580c2d/slc7_amd64_gcc630/cms/crabtaskworker/v3.220314patch1-3391c4ccadfae468fd63c78475580c2d/lib/python3.8/site-packages/WMCore/Services/DBS/DBSErrors.py
LineNumber : 38
ErrorNr : 1002
Log file is /afs/cern.ch/work/b/belforte/CRAB3/TC3/dbg/bug/crab_K892_pT30/crab.log
belforte@lxplus785/bot>
It really looks like TW used to upload a much more terse message :-( But at least in this way it prints the much needed: Status on the CRAB server: SUBMITFAILED
FWIW, the message in https://github.com/dmwm/CRABClient/issues/5159#issuecomment-1071740843 is "only" 7003 char long (from the DB, i.e. measured at https://github.com/dmwm/CRABClient/blob/37caaf2fbfdec47cecc7c9a5820f14e3252ca11e/src/python/CRABClient/Commands/status.py#L416 )
a quick test indicates that truncating message at 1K char is plenty. FYI @mapellidario I suggest to cut at 1K also in the famous PR for b64 https://github.com/dmwm/CRABServer/pull/7106
hmm... maybe WMCore changed and now reports the full http details instead of only one line like Message: Can't contact DBS at https://cmsweb-prod.cern.ch/dbs/prod/global/DBSReader, got errors Instantiating DBS3Reader failed with HTTP Error 502 ?
Dario and myself debugged this. The change is not due to WMCore, but to new DBSClient which prints a new exception message which (besides being too long https://github.com/dmwm/DBSClient/issues/68 ) contains the text from the http header Header=HTTP/1.1 402
or whatever exit code. This is eventually parsed by crab client and triggers the failure.
We have deployed a patched TaskWorker which does not upload such messages anymore. It replaces HTTP/1.1
with HTTP 1.1
which is safe for crab status.
So this is not critical anymore. It may still be good to avoid parsing curl stdout for HTTP code, but we need to test well any change to CRABRestInterface, since all in all Daina may have had good reasons to use this
https://github.com/dmwm/CRABClient/blob/37caaf2fbfdec47cecc7c9a5820f14e3252ca11e/src/python/CRABClient/CrabRestInterface.py#L200
I am inclined to do the following:
@mapellidario I welcome your opinion
in CRABRestInerface the output of curl is piped to stderr, this makes the code wrongly assume that the call to serve failed if the output contains a string reporting an HTTP error, like when task submission failed due to an HTTP error in TaskWorker. In following example task submission failed with the "typical" proxy error 502
tm_task_failure | Problem handling 220317_170502:cmsbot_crab_outLFNDirBase because of DBSReaderError Message: Can't contact DBS at https://cmsweb-prod.cern.ch/dbs/prod/global/DBSReader, got errors Instantiating DBS3Reader failed with HTTP Error 502: URL=https://cmsweb-prod.cern.ch:8443/dbs/prod/global/DBSReader/serverinfo Code=502 Message=Proxy Error Header=HTTP/1.1 502 Proxy Error Date: Thu, 17 Mar 2022 17:14:10 GMT Server: Apache Content-Length: 497 Content-Type: text/html; charset=iso-8859-1 Body= Proxy Error The proxy server received an invalid response from an upstream server. The proxy server could not handle the request GET /auth/complete/dbs/prod/global/DBSReader/serverinfo.
but crab status just prints:
problematic code is (I believe) https://github.com/dmwm/CRABClient/blob/37caaf2fbfdec47cecc7c9a5820f14e3252ca11e/src/python/CRABClient/CrabRestInterface.py#L200-L208 combined with https://github.com/dmwm/CRABClient/blob/37caaf2fbfdec47cecc7c9a5820f14e3252ca11e/src/python/CRABClient/CrabRestInterface.py#L58-L79
I am surprised that this did not appear earlier. Maybe it is another side effect of new message handling in REST (the b64 saga) ? the message is 11.8K characters, maybe earlier it was truncated and this made it formatted in such a way not to tool
parseResponseHeader
?In any case we should not mix HTTP Header and Body in this way, need to understand why we have
|tee /dev/stderr
in https://github.com/dmwm/CRABClient/blob/37caaf2fbfdec47cecc7c9a5820f14e3252ca11e/src/python/CRABClient/CrabRestInterface.py#L200