cms-sw / cmssw

CMS Offline Software
http://cms-sw.github.io/
Apache License 2.0
1.07k stars 4.28k forks source link

FileReadError turned into generic CMSSW exception #18554

Closed ericvaandering closed 7 years ago

ericvaandering commented 7 years ago

Here is an exception that seems to turn an xrootd error into a generic 8001 error in WMAgent. @bbockelm & @Dr15Jones :


An exception of category 'FileReadError' occurred while
   [0] Processing run: 1 lumi: 631 event: 16926368
   [1] Running path 'digitisation_step'
   [2] Prefetching for module MixingModule/'mix'
   [3] Rethrowing an exception that happened on a different thread.
   [4] Reading branch PCaloHits_g4SimHits_EcalHitsEE_SIM.
   [5] Calling XrdFile::readv()
   [6] XrdAdaptor::ClientRequest::HandleResponse() failure while running connection recovery
   [7] In XrdAdaptor::RequestManager::requestFailure()
Exception Message:
XrdAdaptor::RequestManager::requestFailure Open(name='root://xrootd-cms.infn.it//store/mc/PhaseISpring17GS/QCD_Pt-50to80_MuEnrichedPt5_TuneCUETP8M1_13TeV_pythia8/GEN-SIM/BTV05_90X_upgrade2017_realistic_v20-v1/100000/82C72888-3F2D-E711-BF13-A0369FC5D904.root', flags=0x10, permissions=0660, old source=131.169.192.123:33192 (unknown site), new source=131.169.192.123:33192 (unknown site)) => Xrootd server returned an excluded source
   Additional Info:
      [a] Original error: '[ERROR] Invalid session' (errno=0, code=109, source=131.169.192.123:33192 (unknown site)).
      [b] Disabled source: 131.169.192.123:33192
      [c] Active source: 131.169.192.123:33192 (unknown site)
", u'exitCode': 8001}, {u'type': u'Fatal Exception', u'details': u"
An exception of category 'FileReadError' occurred while
   [0] Processing run: 1 lumi: 631 event: 16926205
   [1] Running path 'RAWSIMoutput_step'
   [2] Prefetching for module PoolOutputModule/'RAWSIMoutput'
   [3] Prefetching for module MixingModule/'mix'
   [4] Rethrowing an exception that happened on a different thread.
   [5] Reading branch PCaloHits_g4SimHits_EcalHitsEE_SIM.
   [6] Calling XrdFile::readv()
   [7] XrdAdaptor::ClientRequest::HandleResponse() failure while running connection recovery
   [8] In XrdAdaptor::RequestManager::requestFailure()
Exception Message:
XrdAdaptor::RequestManager::requestFailure Open(name='root://xrootd-cms.infn.it//store/mc/PhaseISpring17GS/QCD_Pt-50to80_MuEnrichedPt5_TuneCUETP8M1_13TeV_pythia8/GEN-SIM/BTV05_90X_upgrade2017_realistic_v20-v1/100000/82C72888-3F2D-E711-BF13-A0369FC5D904.root', flags=0x10, permissions=0660, old source=131.169.192.123:33192 (unknown site), new source=131.169.192.123:33192 (unknown site)) => Xrootd server returned an excluded source
   Additional Info:
      [a] Original error: '[ERROR] Invalid session' (errno=0, code=109, source=131.169.192.123:33192 (unknown site)).
      [b] Disabled source: 131.169.192.123:33192
      [c] Active source: 131.169.192.123:33192 (unknown site
)", u'exitCode': 8001}, {u'type': u'CMSExeption', u'details': u'Exit 8001: CMSExeption Exception from cmsRun
 Adding last 25 lines of CMSSW stderr:
WARNING: In non-interactive mode release checks e.g. deprecated releases, production architectures are disabled.
WARNING: There already exists /tmp/31077673.grid-ce.physik.rwth-aachen.de/glide_pXEleg/execute/dir_27528/job/WMTaskSpace/cmsRun1/CMSSW_9_0_2 area for SCRAM_ARCH slc6_amd64_gcc530.

 Adding last 25 lines of CMSSW stdout:
 1916 XrdAdaptorInternal   pre-events       pre-events       pre-events
 1917 XrdFileWarning       pre-events                        
 1918 GsfMultiStateUpdator 1/16915277       1/16915277       1/16924035
 1919 InvalidState         1/16915277       1/16915277       1/16924035
 1920 L1T                  1/16926368                        
 1921 PosteriorWeightsCalculator 1/16915277       1/16915277       1/16924035
 1922 TimeReport           PostEndRun                        
 1923 TooManyClusters      1/16909279       1/16909401       1/16926076
 1924 TooManyClusters      1/16909279       1/16909402       1/16926076
 1925 Fatal Exception      1/16926205                        
 1926 Fatal Exception      1/16926368                        
 1927 MemoryReport         PostEndRun                        
 1928 fileAction           1/16908901       1/16908901       1/16924090
 1929 fileAction           pre-events       pre-events       pre-events
 1930 fileAction           PostEndRun       PostEndRun       PostEndRun
 1931 fileAction           PostEndRun                        
 1932 fileAction           pre-events       pre-events       pre-events

Severity    # Occurrences   Total Occurrences\n--------    -------------   -----------------
Warning             18738               18738
Error                 245                 245
System                 67                  67
Complete
process id is 30937 status is 65'
cmsbuild commented 7 years ago

A new Issue was created by @ericvaandering Eric Vaandering.

@davidlange6, @Dr15Jones, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

bbockelm commented 7 years ago

It's possible this is the culprit:

   [3] Rethrowing an exception that happened on a different thread.

When an exception is rethrown, do we lose some sort of type information?

Dr15Jones commented 7 years ago

18557 should fix this problem.

Dr15Jones commented 7 years ago

assign core

cmsbuild commented 7 years ago

New categories assigned: core

@Dr15Jones,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks

Dr15Jones commented 7 years ago

+1

cmsbuild commented 7 years ago

This issue is fully signed and ready to be closed.