Closed smuzaffar closed 6 years ago
FYI @vkuznet
Shahzad, this is upstream error and not particular dasgoclient error. Therefore I suggest I'll introduce different error codes to address different usecases, e.g. 1 for dasgoclient specific error, 2 for DBS upstream error, 3 for Phedex, etc. I suggest to add -errorCodes flag which will printout all error codes then.
Regarding your question about cause and frequency of this type of error. I said for many years that instabilities of our cmsweb data-services are increasing. This particular error comes from DBS and then wrapper by our frontend. The obvious cause is increased load on DBS server(s). I shown already that throughput of our python based services is so-so. Therefore I'm not surprised. Of course it would be nice to notify DBS maintainer (Yuyi) about it and provide timestamp/queries that she can look-up and correlate them in DBS/frontend logs.
ok, here is first implementation (commit: https://github.com/dmwm/dasgoclient/commit/ff21d7343b3ea64101f680af49246c5162d093d8)
$ ./dasgoclient -errorCodes
DAS error codes:
1 DAS error
2 DBS upstream error
3 PhEDEx upstream error
4 ReqMgr upstream error
5 RunRegistry upstream error
6 McM upstream error
7 Dashboard upstream error
8 SiteDB upstream error
9 CondDB upstream error
10 Combined error
11 MongoDB error
12 DAS proxy error
13 DAS query error
$ ./dasgoclient -query="bla=1"
ERRO[0000] DAS QL error Query="bla = 1" idx=0 msg="Wrong DAS key: bla"
Unable to parse DAS query, no select keys are found <DASQuery="" inst= hash= time=0001-01-01 00:00:00>
# here the last error code 13 which is DASQueryError
$ echo $?
13
$ ./dasgoclient -query="run=160915"
160915
# even though DAS returns valid result here (from DBS) it fails to query RunRegistry data-service (I didn't setup proper ssh tunnel to its url), therefore it return error code 5 (RunRegistryError)
$ echo $?
5
Let me know your feedback.
I changed -errorCodes to more appropriate -exitCodes.
thanks
During CMSSW IB validation tests we noticed that dasgoclient query failed but dasgoclient did not exit with non-zero code. For example, we get error like [a] but dasgoclient process exited with ZERO exit code.
These types of failure are now bit frequent (happened twice in last 10 days). First do we know the reason why upstream server is failing and second can we make dasgoclient fail in such cases?
[a]