Improve error handling for ErrorHandler :-)

Since central couch has moved to VMs, we started seeing ErrorHandler crashing quite often basically everywhere. We should improve its code to retry later instead of crash in case of problems communicating with central couch (probably ACDC database).

Just in case, this is the component traceback

2014-11-14 12:35:42,918:INFO:ErrorHandlerPoller:Starting to build ACDC with 30 jobs
2014-11-14 12:35:42,918:INFO:ErrorHandlerPoller:This operation will take some time...
2014-11-14 12:36:54,728:ERROR:ErrorHandlerPoller:Caught exception in ErrorHandler
Traceback (most recent call last):
  File "/data/srv/wmagent/v1.0.0.patch2/sw.pre.amaltaro/slc5_amd64_gcc461/cms/wmagent/1.0.0.patch2/lib/python2.6/site-packages/WMComponent/ErrorHandler/ErrorHandlerPoller.py", line 377, in algorithm
    self.handleErrors()
  File "/data/srv/wmagent/v1.0.0.patch2/sw.pre.amaltaro/slc5_amd64_gcc461/cms/wmagent/1.0.0.patch2/lib/python2.6/site-packages/WMComponent/ErrorHandler/ErrorHandlerPoller.py", line 311, in handleErrors
    self.handleRetryDoneJobs(jobList)
  File "/data/srv/wmagent/v1.0.0.patch2/sw.pre.amaltaro/slc5_amd64_gcc461/cms/wmagent/1.0.0.patch2/lib/python2.6/site-packages/WMComponent/ErrorHandler/ErrorHandlerPoller.py", line 269, in handleRetryDoneJobs
    self.exhaustJobs(jobList)
  File "/data/srv/wmagent/v1.0.0.patch2/sw.pre.amaltaro/slc5_amd64_gcc461/cms/wmagent/1.0.0.patch2/lib/python2.6/site-packages/WMComponent/ErrorHandler/ErrorHandlerPoller.py", line 134, in exhaustJobs
    self.handleACDC(jobList)
  File "/data/srv/wmagent/v1.0.0.patch2/sw.pre.amaltaro/slc5_amd64_gcc461/cms/wmagent/1.0.0.patch2/lib/python2.6/site-packages/WMComponent/ErrorHandler/ErrorHandlerPoller.py", line 194, in handleACDC
    self.dataCollection.failedJobs(loadList)
  File "/data/srv/wmagent/v1.0.0.patch2/sw.pre.amaltaro/slc5_amd64_gcc461/cms/wmagent/1.0.0.patch2/lib/python2.6/site-packages/WMCore/Database/CouchUtils.py", line 52, in wrapper
    return funcRef(x, *args, **opts)
  File "/data/srv/wmagent/v1.0.0.patch2/sw.pre.amaltaro/slc5_amd64_gcc461/cms/wmagent/1.0.0.patch2/lib/python2.6/site-packages/WMCore/ACDC/DataCollectionService.py", line 74, in failedJobs
    job.get("owner", "cmsdataops"))
  File "/data/srv/wmagent/v1.0.0.patch2/sw.pre.amaltaro/slc5_amd64_gcc461/cms/wmagent/1.0.0.patch2/lib/python2.6/site-packages/WMCore/ACDC/CouchService.py", line 71, in newOwner
    userInstance = makeUser(group, user, self.url, self.database)
  File "/data/srv/wmagent/v1.0.0.patch2/sw.pre.amaltaro/slc5_amd64_gcc461/cms/wmagent/1.0.0.patch2/lib/python2.6/site-packages/WMCore/GroupUser/User.py", line 93, in makeUser
    group.connect()
  File "/data/srv/wmagent/v1.0.0.patch2/sw.pre.amaltaro/slc5_amd64_gcc461/cms/wmagent/1.0.0.patch2/lib/python2.6/site-packages/WMCore/GroupUser/CouchObject.py", line 87, in connect
    raise CouchConnectionError(msg)
CouchConnectionError

dmwm / WMCore

Improve error handling for ErrorHandler :-) #5470