Closed scarletnorberg closed 6 years ago
patched #8058 and restarted the component. I think that is the same issue. Let me know if this still crashes.
Thanks for updating the twiki page too. If you agree, I'm in favor of patching agents as they hit this issue?
Alan, yes I agree with you. We will patch the agent when problem hits. We need to have better patch.
Also fixed by https://github.com/dmwm/WMCore/pull/8247
https://its.cern.ch/jira/projects/CMSCOMPPR/issues/CMSCOMPPR-1218?filter=addedrecently
Went it down twice today very close together.
Here is the log: <@---------- WMException End ----------@> File "/data/srv/wmagent/v1.1.4.patch2/sw/slc6_amd64_gcc493/cms/wmagent/1.1.4.patch2/lib/python2.7/site-packages/WMCore/WorkerThreads/BaseWorkerThread.py", line 179, in call self.algorithm(parameters) File "/data/srv/wmagent/v1.1.4.patch2/sw/slc6_amd64_gcc493/cms/wmagent/1.1.4.patch2/lib/python2.7/site-packages/WMComponent/JobAccountant/JobAccountantPoller.py", line 88, in algorithm raise JobAccountantPollerException(msg) 2017-08-28 03:45:45,632:140274881804032:INFO:Harness:>>>Terminating worker threads 2017-08-28 03:45:45,654:140274881804032:ERROR:BaseWorkerThread:Error in event loop (2): <WMComponent.JobAccountant.JobAccountantPoller.JobAccountantPoller instance at 0x7f944a815320> <@========== WMException Start ==========@> Exception Class: JobAccountantPollerException Message: Hit general exception in JobAccountantPoller while using worker. 'utf8' codec can't decode byte 0xd0 in position 56427: invalid continuation byte ModuleName : WMComponent.JobAccountant.JobAccountantPoller MethodName : algorithm ClassInstance : None FileName : /data/srv/wmagent/v1.1.4.patch2/sw/slc6_amd64_gcc493/cms/wmagent/1.1.4.patch2/lib/python2.7/site-packages/WMComponent/JobAccountant/JobAccountantPoller.py ClassName : None LineNumber : 88 ErrorNr : 0 Traceback: File "/data/srv/wmagent/v1.1.4.patch2/sw/slc6_amd64_gcc493/cms/wmagent/1.1.4.patch2/lib/python2.7/site-packages/WMComponent/JobAccountant/JobAccountantPoller.py", line 68, in algorithm self.accountantWorker(jobsSlice) File "/data/srv/wmagent/v1.1.4.patch2/sw/slc6_amd64_gcc493/cms/wmagent/1.1.4.patch2/lib/python2.7/site-packages/WMComponent/JobAccountant/AccountantWorker.py", line 292, in call self.stateChanger.propagate(self.listOfJobsToFail, "jobfailed", "complete") File "/data/srv/wmagent/v1.1.4.patch2/sw/slc6_amd64_gcc493/cms/wmagent/1.1.4.patch2/lib/python2.7/site-packages/WMCore/JobStateMachine/ChangeState.py", line 181, in propagate self.recordInCouch(jobs, newstate, oldstate, updatesummary) File "/data/srv/wmagent/v1.1.4.patch2/sw/slc6_amd64_gcc493/cms/wmagent/1.1.4.patch2/lib/python2.7/site-packages/WMCore/JobStateMachine/ChangeState.py", line 452, in recordInCouch self.fwjrdatabase.commit(callback = discardConflictingDocument) File "/data/srv/wmagent/v1.1.4.patch2/sw/slc6_amd64_gcc493/cms/wmagent/1.1.4.patch2/lib/python2.7/site-packages/WMCore/Database/CMSCouch.py", line 281, in commit retval = self.post(uri, data) File "/data/srv/wmagent/v1.1.4.patch2/sw/slc6_amd64_gcc493/cms/wmagent/1.1.4.patch2/lib/python2.7/site-packages/WMCore/Services/Requests.py", line 121, in post encode, decode, contentType) File "/data/srv/wmagent/v1.1.4.patch2/sw/slc6_amd64_gcc493/cms/wmagent/1.1.4.patch2/lib/python2.7/site-packages/WMCore/Database/CMSCouch.py", line 120, in makeRequest encode, decode, contentType) File "/data/srv/wmagent/v1.1.4.patch2/sw/slc6_amd64_gcc493/cms/wmagent/1.1.4.patch2/lib/python2.7/site-packages/WMCore/Services/Requests.py", line 149, in makeRequest encoder, decoder, contentType) File "/data/srv/wmagent/v1.1.4.patch2/sw/slc6_amd64_gcc493/cms/wmagent/1.1.4.patch2/lib/python2.7/site-packages/WMCore/Services/Requests.py", line 229, in makeRequest_httplib encoded_data = self.encode(data) File "/data/srv/wmagent/v1.1.4.patch2/sw/slc6_amd64_gcc493/cms/wmagent/1.1.4.patch2/lib/python2.7/site-packages/WMCore/Services/Requests.py", line 563, in encode return encoder.encode(thunked) File "/data/srv/wmagent/v1.1.4.patch2/sw/slc6_amd64_gcc493/external/python/2.7.13/lib/python2.7/json/encoder.py", line 207, in encode chunks = self.iterencode(o, _one_shot=True) File "/data/srv/wmagent/v1.1.4.patch2/sw/slc6_amd64_gcc493/external/python/2.7.13/lib/python2.7/json/encoder.py", line 270, in iterencode return _iterencode(o, 0) <@---------- WMException End ----------@> Backtrace: File "/data/srv/wmagent/v1.1.4.patch2/sw/slc6_amd64_gcc493/cms/wmagent/1.1.4.patch2/lib/python2.7/site-packages/WMCore/WorkerThreads/BaseWorkerThread.py", line 205, in call raise ex 2017-08-28 03:45:45,654:140274881804032:INFO:BaseWorkerThread:Worker thread <WMComponent.JobAccountant.JobAccountantPoller.JobAccountantPoller instance at 0x7f944a815320> terminated