cms-sw / cmssw

CMS Offline Software
http://cms-sw.github.io/
Apache License 2.0
1.07k stars 4.28k forks source link

[ROOT626] Test `testSSTGainPCL_fromCalibTree` from module `CalibTracker/SiStripChannelGain` not able to reach root file #40097

Closed aandvalenzuela closed 1 year ago

aandvalenzuela commented 1 year ago

Hello,

When testing latest changes for ROOT626, we realized that unit test testSSTGainPCL_fromCalibTree from module CalibTracker/SiStripChannelGain was failing due to not being able to open root file root://cms-xrd-global.cern.ch//store/group/dpg_tracker_strip/comm_tracker/Strip/Calibration/calibrationtree/GR18/calibTree_325310.root due to socket time out, although it is reporting an error on missing tree gainCalibrationTreeStdBunch:

%MSG-w XrdAdaptorInternal:  SiStripGainsCalibTreeWorker:SiStripCalib  16-Nov-2022 21:12:16 CET Run: 325310 Event: 1
Failed to open file at URL root://cms-xrd-global.cern.ch:1094//store/group/dpg_tracker_strip/comm_tracker/Strip/Calibration/calibrationtree/GR18/calibTree_325310.root?tried=+1098cms-xrd-global011098cms-xrd-global02.cern.ch,+eymir.grid.metu.dpmfedredir_cms@eymir.grid.metu.edu.tr,xrootd-cms.infn.it,eymir.grid.metu.edu.tr&xrdcl.requuid=3c46ff5b-d88e-47e1-b934-6d123cb357d4.
%MSG
[2022-11-16 21:15:11.473560 +0100][Error  ][XRootD            ] [cms-xrd-global.cern.ch:1094] Unable to get the response to request kXR_open (file: /store/group/dpg_tracker_strip/comm_tracker/Strip/Calibration/calibrationtree/GR18/calibTree_325310.root?tried=+1213xrootd-cms-redir-int.cr.cnaf.infn.it,+1098cms-xrd-global011098cms-xrd-global02.cern.ch,+eymir.grid.metu.dpmfedredir_cms@eymir.grid.metu.edu.tr,,xrootd-cms.infn.it,eymir.grid.metu.edu.tr,cms-xrd-transit.cern.ch,cms-xrd-global.cern.ch,xrootd-cms-redir-int.cr.cnaf.infn.it&triedrc=srverr,enoent,enoent, mode: 0460, flags: kXR_open_read kXR_async kXR_retstat )
%MSG-w XrdAdaptorInternal:  SiStripGainsCalibTreeWorker:SiStripCalib  16-Nov-2022 21:15:11 CET Run: 325310 Event: 1
Failed to open file at URL root://cms-xrd-global.cern.ch:1094//store/group/dpg_tracker_strip/comm_tracker/Strip/Calibration/calibrationtree/GR18/calibTree_325310.root?tried=&xrdcl.requuid=068b9b98-4f08-4946-a919-b8a67834cb83.
%MSG
----- Begin Fatal Exception 16-Nov-2022 21:15:11 CET-----------------------
An exception of category 'FatalRootError' occurred while
   [0] Processing  Event run: 325310 lumi: 1 event: 1 stream: 0
   [1] Running path 'pathALCARECOPromptCalibProdSiStripGains'
   [2] Calling method for module SiStripGainsCalibTreeWorker/'SiStripCalib'
   Additional Info:
      [a] Fatal Root Error: @SUB=TChain::LoadTree
Cannot find tree with name gainCalibrationTreeStdBunch/tree in file root://cms-xrd-global.cern.ch//store/group/dpg_tracker_strip/comm_tracker/Strip/Calibration/calibrationtree/GR18/calibTree_325310.root

----- End Fatal Exception -------------------------------------------------

See stack trace and test results.

This test was running fine the last time we tested ROOT626 (branch: cms/v6-26-00-patches/d87570b62e) and from there to the current commit under testing (branch: cms/v6-26-00-patches/54b182755c) there have been 9 new commits (see comparison). I cannot see any changes in TChain, but it looks more like an error on the sever side. This test is not explicitly failing in IBs, but the file is anyway not reachable:

===== Test "testSSTGainPCL_fromCalibTree" ====
argument 0: testSSTGainPCL_fromCalibTree
argument 1: /bin/bash
argument 2: CalibTracker/SiStripChannelGain/test
argument 3: testSSTGain_PCL_FromCalibTree.sh
shell is: /bin/bash
Current directory is: /data/cmsbld/jenkins/workspace/ib-run-qa/CMSSW_12_6_ROOT626_X_2022-11-16-2300
topdir is: /data/cmsbld/jenkins/workspace/ib-run-qa/CMSSW_12_6_ROOT626_X_2022-11-16-2300
testdir is: /data/cmsbld/jenkins/workspace/ib-run-qa/CMSSW_12_6_ROOT626_X_2022-11-16-2300/src/CalibTracker/SiStripChannelGain/test
tmpdir is: /data/cmsbld/jenkins/workspace/ib-run-qa/CMSSW_12_6_ROOT626_X_2022-11-16-2300/tmp/el8_amd64_gcc10
testbin is: /data/cmsbld/jenkins/workspace/ib-run-qa/CMSSW_12_6_ROOT626_X_2022-11-16-2300/test/el8_amd64_gcc10
Running script: /data/cmsbld/jenkins/workspace/ib-run-qa/CMSSW_12_6_ROOT626_X_2022-11-16-2300/src/CalibTracker/SiStripChannelGain/test/testSSTGain_PCL_FromCalibTree.sh
[ERROR] Server responded with an error: [3011] No servers have the file

xrdfs command status = 54
SKIPPING test, file calibTree_325310.root not found: status 0
status = 0

---> test testSSTGainPCL_fromCalibTree succeeded

See latest ROOT626 IB.

Many thanks, Andrea.

cmsbuild commented 1 year ago

A new Issue was created by @aandvalenzuela Andrea Valenzuela.

@Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

mmusich commented 1 year ago

@mdelcourt, did you recently delete the input file of the test?

makortel commented 1 year ago

In CMSSW_12_6_X_2022-11-16-1100 default IBs the test fails explicitly https://cmssdt.cern.ch/SDT/cgi-bin/logreader/el8_amd64_gcc10/CMSSW_12_6_X_2022-11-16-1100/unitTestLogs/CalibTracker/SiStripChannelGain#/ . This seems to also be the first IB where the test started to fail.

makortel commented 1 year ago

assign alca

FYI @cms-sw/trk-dpg-l2

cmsbuild commented 1 year ago

New categories assigned: alca

@yuanchao,@francescobrivio,@malbouis,@saumyaphor4252,@tvami,@ChrisMisan you have been requested to review this Pull request/Issue and eventually sign? Thanks

sroychow commented 1 year ago

@aandvalenzuela I have restored the file /store/group/dpg_tracker_strip/comm_tracker/Strip/Calibration/calibrationtree/GR18/calibTree_325310.root. So the tests should pass in the next IB.

mmusich commented 1 year ago

@aandvalenzuela is there a way to put the file on the ib eos space to avoid it gets deleted inadvertently again?

smuzaffar commented 1 year ago

the file should already be in ib eos cache. You need to update to test to not access it using root://cms-xrd-global.cern.ch/ . Just use the LFN and test should work

sroychow commented 1 year ago

@smuzaffar the unit test seems to fail even in the PR. Do I access it using root://eoscms.cern.ch// instead?

makortel commented 1 year ago

I think the problem is that the SiStripGainsCalibTreeWorker passes the file names as they are to TChain::Add() https://github.com/cms-sw/cmssw/blob/97b79e261edf4240415b7e73371ca81fc18c697b/CalibTracker/SiStripChannelGain/plugins/SiStripGainsCalibTreeWorker.cc#L289-L292 without the use of Catalog that (e.g.) PoolSource uses for the LFN->PFN mapping. I suppose the options would be either to use the file explicitly from IB eos cache (with a PFN), or make the code to use the Catalog.

tvami commented 1 year ago

+alca

cmsbuild commented 1 year ago

This issue is fully signed and ready to be closed.

mmusich commented 1 year ago

please close