SegmentLinking / TrackLooper

Apache License 2.0
5 stars 14 forks source link

Upgrade to CMSSW 13_3_0_pre3 and use libraries directly from CMSSW #348

Closed ariostas closed 1 year ago

ariostas commented 1 year ago

This PR is a follow-up on #336. I upgraded the CMSSW version to 13_3_0_pre3, so as to match the latest version that is being used in the CMSSW fork.

I also changed things so that all libraries are pulled from the CMSSW version that is currently being used. This prevents possible library mismatches that could cause trouble. In particular, I deleted the copy alpaka_interface that was being used for the caching allocator, since this version of CMSSW has a recent version of it.

The standalone version works fine, but before we merge we'll need to make sure that it still works with CMSSW.

GNiendorf commented 1 year ago

This is something I asked about a while ago, and the response was that we didn't want to go down this route because it ties our standalone version to CMSSW. Have people changed their view on this?

ariostas commented 1 year ago

I added instructions for how to run LST when CVMFS is not available. The setup scripts already heavily rely on CVMFS, so removing the alpaka_interface headers doesn't really make it any more difficult to run without CVMFS. I think we will almost never need to do this, but I think it's still reasonably straightforward.

Also, I did more testing and I can confirm that the issue #312 no longer occurs. I've included the list of commands that I used to test it, and the full output in lnx7188.

Commands used for testing in lnx7188

git clone --branch upgrade_cmssw https://github.com/SegmentLinking/TrackLooper.git
cd TrackLooper
source setup.sh
make code/rooutil/
sdl_make_tracklooper -c
scramv1 project CMSSW $CMSSW_VERSION
cd $CMSSW_VERSION/src
eval `scramv1 runtime -sh`
git cms-init
git remote add SegLink https://github.com/SegmentLinking/cmssw.git
git fetch SegLink CMSSW_13_3_0_pre3_LST_X
git cms-addpkg RecoTracker Configuration
git checkout CMSSW_13_3_0_pre3_LST_X
cat <<EOF >lst_cpu.xml
<tool name="lst_cpu" version="1.0">
  <client>
    <environment name="LSTBASE" default="$PWD/../../../TrackLooper"/>
    <environment name="LIBDIR" default="\$LSTBASE/SDL"/>
    <environment name="INCLUDE" default="\$LSTBASE"/>
  </client>
  <runtime name="LST_BASE" value="\$LSTBASE"/>
  <lib name="sdl_cpu"/>
</tool>
EOF
cat <<EOF >lst_cuda.xml
<tool name="lst_cuda" version="1.0">
  <client>
    <environment name="LSTBASE" default="$PWD/../../../TrackLooper"/>
    <environment name="LIBDIR" default="\$LSTBASE/SDL"/>
    <environment name="INCLUDE" default="\$LSTBASE"/>
  </client>
  <runtime name="LST_BASE" value="\$LSTBASE"/>
  <lib name="sdl_cuda"/>
</tool>
EOF
scram setup lst_cpu.xml
scram setup lst_cuda.xml
eval `scramv1 runtime -sh`
git cms-checkdeps -a -A
scram b -j 32
cmsDriver.py step3  -s RAW2DIGI,RECO:reconstruction_trackingOnly,VALIDATION:@trackingOnlyValidation,DQM:@trackingOnlyDQM --conditions auto:phase2_realistic_T21 --datatier GEN-SIM-RECO,DQMIO -n 10 --eventcontent RECOSIM,DQM --geometry Extended2026D88 --era Phase2C17I13M9 --procModifiers gpu,trackingLST,trackingIters01 --no_exec
sed -i "29i process.load('Configuration.StandardSequences.Accelerators_cff')\nprocess.load('HeterogeneousCore.AlpakaCore.ProcessAcceleratorAlpaka_cfi')" step3_RAW2DIGI_RECO_VALIDATION_DQM.py
sed -i "s|fileNames = cms.untracked.vstring('file:step3_DIGI2RAW.root')|fileNames = cms.untracked.vstring('file:/data2/segmentlinking/step2_21034.1_100Events.root')|" step3_RAW2DIGI_RECO_VALIDATION_DQM.py
cmsRun step3_RAW2DIGI_RECO_VALIDATION_DQM.py

lnx7188 output

Singularity> cmsRun step3_RAW2DIGI_RECO_VALIDATION_DQM.py
%MSG-i CUDAService:  (NoModuleName) 09-Nov-2023 14:35:16 EST pre-events
CUDA runtime version 11.8, driver version 12.2, NVIDIA driver version 535.86.10
CUDA device 0: Tesla V100-SXM2-32GB (sm_70)
CUDA device 1: Tesla V100-SXM2-32GB (sm_70)
%MSG
%MSG-i AlpakaService:  (NoModuleName) 09-Nov-2023 14:35:16 EST pre-events
AlpakaServiceSerialSync succesfully initialised.
Found 1 device:
  - Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz
%MSG
%MSG-i AlpakaService:  (NoModuleName) 09-Nov-2023 14:35:16 EST pre-events
AlpakaServiceCudaAsync succesfully initialised.
Found 2 devices:
  - Tesla V100-SXM2-32GB
  - Tesla V100-SXM2-32GB
%MSG
09-Nov-2023 14:36:45 EST  Initiating request to open file file:/data2/segmentlinking/step2_21034.1_100Events.root
09-Nov-2023 14:36:54 EST  Successfully opened file file:/data2/segmentlinking/step2_21034.1_100Events.root
%MSG-w UnusedProductsForCanDeleteEarly:  AfterModDestruction  09-Nov-2023 14:37:02 EST pre-events
The following products in the 'canDeleteEarly' list are not used in this job and will be ignored.
 If possible, remove the producer from the job.
 IntermediateHitDoublets_tripletElectronHitDoublets__RECO
 RegionsSeedingHitSets_tripletElectronHitTriplets__RECO
 TrajectorysToOnerecoTracksAssociation_muonSeededTracksInOut__RECO
 TrajectorysToOnerecoTracksAssociation_muonSeededTracksOutIn__RECO
 TrajectorysToOnerecoTracksAssociation_preDuplicateMergingGeneralTracks__RECO
 Trajectorys_muonSeededTracksInOut__RECO
 Trajectorys_muonSeededTracksOutIn__RECO
 Trajectorys_preDuplicateMergingGeneralTracks__RECO
%MSG
Begin processing the 1st record. Run 1, Event 1, LumiSection 1 on stream 0 at 09-Nov-2023 14:39:23.177 EST
#--------------------------------------------------------------------------
#                         FastJet release 3.4.1
#                 M. Cacciari, G.P. Salam and G. Soyez                  
#     A software package for jet finding and analysis at colliders      
#                           http://fastjet.fr                           
#                                                                         
# Please cite EPJC72(2012)1896 [arXiv:1111.6097] if you use this package
# for scientific work and optionally PLB641(2006)57 [hep-ph/0512210].   
#                                                                       
# FastJet is provided without warranty under the GNU GPL v2 or higher.  
# It uses T. Chan's closest pair algorithm, S. Fortune's Voronoi code
# and 3rd party plugin jet algorithms. See COPYING file for details.
#--------------------------------------------------------------------------
%MSG-w TrackingMonitor:  TrackingMonitor:TrackSeedMonhighPtTripletStep  09-Nov-2023 14:40:31 EST Run: 1 Event: 1
Seed collection size (8692) differs from seed stop info collection size (0). This is a sign of inconsistency in the configuration. Not filling associated histograms.
%MSG
%MSG-e TooManyClusters:   PhotonConversionTrajectorySeedProducerFromSingleLeg:photonConvTrajSeedFromSingleLeg  09-Nov-2023 14:40:31 EST Run: 1 Event: 1
Found too many clusters (179032), bailing out.

%MSG
Begin processing the 2nd record. Run 1, Event 2, LumiSection 1 on stream 0 at 09-Nov-2023 14:40:36.675 EST
%MSG-w TrackingMonitor:  TrackingMonitor:TrackSeedMonhighPtTripletStep  09-Nov-2023 14:40:43 EST Run: 1 Event: 2
Seed collection size (6760) differs from seed stop info collection size (0). This is a sign of inconsistency in the configuration. Not filling associated histograms.
%MSG
%MSG-e TooManyClusters:   PhotonConversionTrajectorySeedProducerFromSingleLeg:photonConvTrajSeedFromSingleLeg  09-Nov-2023 14:40:43 EST Run: 1 Event: 2
Found too many clusters (157706), bailing out.

%MSG
Begin processing the 3rd record. Run 1, Event 3, LumiSection 1 on stream 0 at 09-Nov-2023 14:40:47.670 EST
%MSG-e TooManyClusters:   PhotonConversionTrajectorySeedProducerFromSingleLeg:photonConvTrajSeedFromSingleLeg  09-Nov-2023 14:40:54 EST Run: 1 Event: 3
Found too many clusters (169109), bailing out.

%MSG
%MSG-w TrackingMonitor:  TrackingMonitor:TrackSeedMonhighPtTripletStep  09-Nov-2023 14:40:55 EST Run: 1 Event: 3
Seed collection size (7767) differs from seed stop info collection size (0). This is a sign of inconsistency in the configuration. Not filling associated histograms.
%MSG
Begin processing the 4th record. Run 1, Event 4, LumiSection 1 on stream 0 at 09-Nov-2023 14:41:00.265 EST
%MSG-w TrackingMonitor:  TrackingMonitor:TrackSeedMonhighPtTripletStep  09-Nov-2023 14:41:07 EST Run: 1 Event: 4
Seed collection size (8887) differs from seed stop info collection size (0). This is a sign of inconsistency in the configuration. Not filling associated histograms.
%MSG
%MSG-e TooManyClusters:   PhotonConversionTrajectorySeedProducerFromSingleLeg:photonConvTrajSeedFromSingleLeg  09-Nov-2023 14:41:08 EST Run: 1 Event: 4
Found too many clusters (180093), bailing out.

%MSG
Begin processing the 5th record. Run 1, Event 5, LumiSection 1 on stream 0 at 09-Nov-2023 14:41:12.684 EST
%MSG-w TrackingMonitor:  TrackingMonitor:TrackSeedMonhighPtTripletStep  09-Nov-2023 14:41:22 EST Run: 1 Event: 5
Seed collection size (11681) differs from seed stop info collection size (0). This is a sign of inconsistency in the configuration. Not filling associated histograms.
%MSG
%MSG-e TooManyClusters:   PhotonConversionTrajectorySeedProducerFromSingleLeg:photonConvTrajSeedFromSingleLeg  09-Nov-2023 14:41:25 EST Run: 1 Event: 5
Found too many clusters (205679), bailing out.

%MSG
Begin processing the 6th record. Run 1, Event 6, LumiSection 1 on stream 0 at 09-Nov-2023 14:41:28.667 EST
%MSG-w TrackingMonitor:  TrackingMonitor:TrackSeedMonhighPtTripletStep  09-Nov-2023 14:41:38 EST Run: 1 Event: 6
Seed collection size (12934) differs from seed stop info collection size (0). This is a sign of inconsistency in the configuration. Not filling associated histograms.
%MSG
%MSG-e TooManyClusters:   PhotonConversionTrajectorySeedProducerFromSingleLeg:photonConvTrajSeedFromSingleLeg  09-Nov-2023 14:41:38 EST Run: 1 Event: 6
Found too many clusters (211208), bailing out.

%MSG
Begin processing the 7th record. Run 1, Event 7, LumiSection 1 on stream 0 at 09-Nov-2023 14:41:44.255 EST
%MSG-e TooManyClusters:   PhotonConversionTrajectorySeedProducerFromSingleLeg:photonConvTrajSeedFromSingleLeg  09-Nov-2023 14:41:52 EST Run: 1 Event: 7
Found too many clusters (188974), bailing out.

%MSG
%MSG-w TrackingMonitor:  TrackingMonitor:TrackSeedMonhighPtTripletStep  09-Nov-2023 14:41:52 EST Run: 1 Event: 7
Seed collection size (10382) differs from seed stop info collection size (0). This is a sign of inconsistency in the configuration. Not filling associated histograms.
%MSG
Begin processing the 8th record. Run 1, Event 8, LumiSection 1 on stream 0 at 09-Nov-2023 14:41:57.749 EST
%MSG-e TooManyClusters:   PhotonConversionTrajectorySeedProducerFromSingleLeg:photonConvTrajSeedFromSingleLeg  09-Nov-2023 14:42:05 EST Run: 1 Event: 8
Found too many clusters (180919), bailing out.

%MSG
%MSG-w TrackingMonitor:  TrackingMonitor:TrackSeedMonhighPtTripletStep  09-Nov-2023 14:42:05 EST Run: 1 Event: 8
Seed collection size (8971) differs from seed stop info collection size (0). This is a sign of inconsistency in the configuration. Not filling associated histograms.
%MSG
Begin processing the 9th record. Run 1, Event 9, LumiSection 1 on stream 0 at 09-Nov-2023 14:42:10.064 EST
%MSG-w TrackingMonitor:  TrackingMonitor:TrackSeedMonhighPtTripletStep  09-Nov-2023 14:42:17 EST Run: 1 Event: 9
Seed collection size (8261) differs from seed stop info collection size (0). This is a sign of inconsistency in the configuration. Not filling associated histograms.
%MSG
%MSG-e TooManyClusters:   PhotonConversionTrajectorySeedProducerFromSingleLeg:photonConvTrajSeedFromSingleLeg  09-Nov-2023 14:42:17 EST Run: 1 Event: 9
Found too many clusters (170110), bailing out.

%MSG
Begin processing the 10th record. Run 1, Event 10, LumiSection 1 on stream 0 at 09-Nov-2023 14:42:21.754 EST
%MSG-w TrackingMonitor:  TrackingMonitor:TrackSeedMonhighPtTripletStep  09-Nov-2023 14:42:30 EST Run: 1 Event: 10
Seed collection size (10323) differs from seed stop info collection size (0). This is a sign of inconsistency in the configuration. Not filling associated histograms.
%MSG
%MSG-e TooManyClusters:   PhotonConversionTrajectorySeedProducerFromSingleLeg:photonConvTrajSeedFromSingleLeg  09-Nov-2023 14:42:30 EST Run: 1 Event: 10
Found too many clusters (192559), bailing out.

%MSG
09-Nov-2023 14:42:36 EST  Closed file file:/data2/segmentlinking/step2_21034.1_100Events.root
Singularity>