cms-sw / cmssw

CMS Offline Software
http://cms-sw.github.io/
Apache License 2.0
1.09k stars 4.33k forks source link

CMSSW_13_2_2 fails to build with PGO and LTO #42721

Open fwyzard opened 1 year ago

fwyzard commented 1 year ago

CMSSW_13_2_2 fails to build with LTO (which is enabled by default) during the second PGO pass:

>> Building shared library tmp/el8_amd64_gcc11/src/RecoTracker/TrackProducer/src/RecoTrackerTrackProducer/libRecoTrackerTrackProducer.so
/data/cmssw/el8_amd64_gcc11/external/gcc/11.4.1-30ebdc301ebd200f2ae0e3d880258e65/bin/c++ -O2 -pthread -pipe -Werror=main -Werror=pointer-arith -Werror=overlength-strings -Wno-vla -Werror=overflow -std=c++17 -ftree-vectorize -Werror=array-bounds -Werror=format-contains-nul -Werror=type-limits -fvisibility-inlines-hidden -fno-math-errno --param vect-max-version-for-alias-checks=50 -Xassembler --compress-debug-sections -fuse-ld=bfd -msse3 -felide-constructors -fmessage-length=0 -Wall -Wno-non-template-friend -Wno-long-long -Wreturn-type -Wextra -Wpessimizing-move -Wclass-memaccess -Wno-cast-function-type -Wno-unused-but-set-parameter -Wno-ignored-qualifiers -Wno-deprecated-copy -Wno-unused-parameter -Wunused -Wparentheses -Wno-deprecated -Werror=return-type -Werror=missing-braces -Werror=unused-value -Werror=unused-label -Werror=address -Werror=format -Werror=sign-compare -Werror=write-strings -Werror=delete-non-virtual-dtor -Werror=strict-aliasing -Werror=narrowing -Werror=unused-but-set-variable -Werror=reorder -Werror=unused-variable -Werror=conversion-null -Werror=return-local-addr -Wnon-virtual-dtor -Werror=switch -fdiagnostics-show-option -Wno-unused-local-typedefs -Wno-attributes -Wno-psabi -Wno-error=unused-variable -DBOOST_DISABLE_ASSERTS -flto -fipa-icf -flto-odr-type-merging -fno-fat-lto-objects -Wodr -fprofile-use -fprofile-partial-training -fprofile-update=atomic -fprofile-correction -fprofile-prefix-path=/data/user/fwyzard/CMSSW_13_2_2_PGO -fprofile-dir=/data/user/fwyzard/CMSSW_13_2_2_PGO/pgo/cmssw -shared -Wl,-E -Wl,-z,defs tmp/el8_amd64_gcc11/src/RecoTracker/TrackProducer/src/RecoTrackerTrackProducer/DAFTrackProducerAlgorithm.cc.o tmp/el8_amd64_gcc11/src/RecoTracker/TrackProducer/src/RecoTrackerTrackProducer/GsfTrackProducerBase.cc.o tmp/el8_amd64_gcc11/src/RecoTracker/TrackProducer/src/RecoTrackerTrackProducer/KfTrackProducerBase.cc.o tmp/el8_amd64_gcc11/src/RecoTracker/TrackProducer/src/RecoTrackerTrackProducer/TrackProducerAlgorithm.cc.o tmp/el8_amd64_gcc11/src/RecoTracker/TrackProducer/src/RecoTrackerTrackProducer/TrajectoryToResiduals.cc.o -o tmp/el8_amd64_gcc11/src/RecoTracker/TrackProducer/src/RecoTrackerTrackProducer/libRecoTrackerTrackProducer.so -Wl,-E -Wl,--hash-style=gnu -L/data/user/fwyzard/CMSSW_13_2_2_PGO/biglib/el8_amd64_gcc11 -L/data/user/fwyzard/CMSSW_13_2_2_PGO/lib/el8_amd64_gcc11 -L/data/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_13_2_2/biglib/el8_amd64_gcc11 -L/data/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_13_2_2/lib/el8_amd64_gcc11 -L/data/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_13_2_2/external/el8_amd64_gcc11/lib -L/data/cmssw/el8_amd64_gcc11/external/cuda/11.8.0-9f0af0f4206be7b705fe550319c49a11/lib64/stubs -L/data/user/fwyzard/CMSSW_13_2_2_PGO/static/el8_amd64_gcc11 -L/data/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_13_2_2/static/el8_amd64_gcc11 -lRecoTrackerSiTrackerMRHTools -lRecoTrackerMeasurementDet -lTrackingToolsKalmanUpdators -lTrackingToolsTrackFitters -lRecoTrackerTransientTrackingRecHit -lTrackingToolsGsfTools -lTrackingToolsMeasurementDet -lTrackingToolsRecoGeometry -lRecoLocalTrackerSiStripRecHitConverter -lRecoTrackerTkDetLayers -lTrackingToolsPatternTools -lRecoMTDDetLayers -lRecoMuonDetLayers -lTrackingToolsTransientTrackingRecHit -lRecoLocalTrackerPhase2TrackerRecHits -lTrackingToolsDetLayers -lRecoLocalTrackerClusterParameterEstimator -lTrackingToolsGeomPropagators -lDataFormatsGsfTrackReco -lTrackingToolsTrajectoryState -lDataFormatsTrackReco -lRecoTrackerRecord -lDataFormatsTrackCandidate -lDataFormatsTrackerRecHit2D -lTrackingToolsRecords -lCondFormatsSiPhase2TrackerObjects -lDataFormatsFTLRecHit -lDataFormatsTrajectorySeed -lGeometryMTDGeometryBuilder -lRecoLocalTrackerRecords -lCalibFormatsSiStripObjects -lCalibTrackerRecords -lDataFormatsTrackingRecHit -lGeometryCSCGeometry -lGeometryDTGeometry -lGeometryGEMGeometry -lGeometryMTDNumberingBuilder -lGeometryRPCGeometry -lGeometryTrackerGeometryBuilder -lCondFormatsSiStripObjects -lDataFormatsL1TrackTrigger -lMagneticFieldRecords -lRecoMuonRecords -lCondFormatsDataRecord -lDataFormatsTrackerCommon -lGeometryCommonTopologies -lRecoMTDRecords -lCondFormatsAlignment -lDataFormatsBeamSpot -lDataFormatsGeometryCommonDetAlgo -lDataFormatsSiStripCluster -lGeometryRecords -lGeometryTrackerNumberingBuilder -lTrackingToolsAnalyticalJacobians -lCondFormatsAlignmentRecord -lDataFormatsCandidate -lDataFormatsGeometrySurface -lDataFormatsSiStripCommon -lDataFormatsTrajectoryState -lDetectorDescriptionCore -lDetectorDescriptionDDCMS -lGeometryMTDCommonData -lHeterogeneousCoreCUDACore -lMagneticFieldEngine -lCondFormatsGeometryObjects -lDataFormatsCLHEP -lDataFormatsCaloRecHit -lDataFormatsEcalDetId -lDataFormatsForwardDetId -lDataFormatsGeometryVector -lDataFormatsL1GlobalTrigger -lDataFormatsMuonDetId -lDataFormatsPhase2TrackerCluster -lDataFormatsSiPixelDetId -lDataFormatsSiStripDetId -lFWCoreFramework -lHeterogeneousCoreCUDAServices -lCUDADataFormatsCommon -lDataFormatsDetId -lDataFormatsFEDRawData -lDataFormatsL1GlobalMuonTrigger -lDataFormatsMath -lDataFormatsPhase2TrackerDigi -lDataFormatsScouting -lDataFormatsSiPixelCluster -lDataFormatsSiStripDigi -lFWCoreCommon -lFWCoreServiceRegistry -lCondFormatsPhysicsToolsObjects -lDataFormatsCommon -lFWCoreParameterSet -lHeterogeneousCoreCUDAUtilities -lFWCoreMessageLogger -lDataFormatsProvenance -lFWCorePluginManager -lFWCoreReflection -lTrackingToolsTrajectoryParametrization -lCondFormatsSerialization -lFWCoreConcurrency -lFWCoreUtilities -lFWCoreVersion -lUtilitiesBinningTools -lUtilitiesGeneral -lDDAlign -lDDCond -lDDCore -lDDParsers -lPhysics -lHist -lMatrix -lGenVector -lMathMore -lTree -lNet -lGeom -lThread -lMathCore -lRIO -lSmatrix -lboost_iostreams -lboost_serialization -lCore -lboost_thread -lboost_date_time -lCLHEP -lpcre -lbz2 -lcudart -lcudadevrt -lnvToolsExt -lnvidia-ml -lgsl -luuid -ltbb -lxerces-c -llzma -lz -lcuda -lfmt -lcms-md5 -lopenblas -lcrypt -ldl -lrt -lstdc++fs -ltinyxml2
lto-wrapper: warning: using serial compilation of 4 LTRANS jobs
/data/user/fwyzard/CMSSW_13_2_2_PGO/src/RecoTracker/TrackProducer/interface/TrackProducerBase.icc: In member function 'getFromEvt':
/data/user/fwyzard/CMSSW_13_2_2_PGO/src/TrackingTools/TrajectoryState/interface/BasicSingleTrajectoryState.h:10:7: error: array subscript 'struct __as_base [0]' is partly outside array bounds of 'unsigned char[72]' [-Werror=array-bounds]
   10 | class BasicSingleTrajectoryState final : public BasicTrajectoryState {
      |       ^
/data/cmssw/el8_amd64_gcc11/external/gcc/11.4.1-30ebdc301ebd200f2ae0e3d880258e65/include/c++/11.4.1/ext/new_allocator.h:127:48: note: referencing an object of size 72 allocated by 'operator new'
  127 |         return static_cast<_Tp*>(::operator new(__n * sizeof(_Tp)));
      |                                                ^
lto1: some warnings being treated as errors
lto-wrapper: fatal error: /data/cmssw/el8_amd64_gcc11/external/gcc/11.4.1-30ebdc301ebd200f2ae0e3d880258e65/bin/c++ returned 1 exit status
compilation terminated.

I'll add the details to reproduce the issue shortly.

fwyzard commented 1 year ago

assign core

cmsbuild commented 1 year ago

New categories assigned: core

@Dr15Jones,@smuzaffar,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild commented 1 year ago

A new Issue was created by @fwyzard Andrea Bocci.

@Dr15Jones, @rappoccio, @smuzaffar, @makortel, @sextonkennedy, @antoniovilela can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

fwyzard commented 1 year ago

The PGO profile was generated rebuilding a release like this:

cmsrel CMSSW_13_2_2
cd CMSSW_13_2_2
cmsenv

sed -i -e's|ENABLE_PGO="0"|ENABLE_PGO="1"|' $CMSSW_BASE/config/Self.xml 
sed -i -e's|<flags PGO_FLAGS=".*"/>|<flags PGO_FLAGS="-fprofile-generate -fprofile-update=atomic -fprofile-correction -fprofile-prefix-path=$CMSSW_BASE -fprofile-dir=$CMSSW_BASE/pgo/cmssw"/>|' $CMSSW_BASE/config/toolbox/el8_amd64_gcc11/tools/selected/gcc-cxxcompiler.xml
scram setup self
scram setup gcc-cxxcompiler
cmsenv

cd $CMSSW_BASE/src
git cms-addpkg '*/*'
scram b clean
scram b -j`nproc`
rm -r $CMSSW_BASE/pgo/cmssw/

and running an online-like HLT workflow over 11k collision events.

The PGO profile was applied rebuilding the release like this:

sed -i -e's|ENABLE_PGO="0"|ENABLE_PGO="1"|' $CMSSW_BASE/config/Self.xml 
sed -i -e's|<flags PGO_FLAGS=".*"/>|<flags PGO_FLAGS="-fprofile-use -fprofile-partial-training -fprofile-update=atomic -fprofile-correction -fprofile-prefix-path=$CMSSW_BASE -fprofile-dir=$CMSSW_BASE/pgo/cmssw -Wno-missing-profile"/>|' $CMSSW_BASE/config/toolbox/el8_amd64_gcc11/tools/selected/gcc-cxxcompiler.xml
scram setup self
scram setup gcc-cxxcompiler
cmsenv

cd $CMSSW_BASE/src
git cms-addpkg '*/*'
scram b clean
scram b -j`nproc`

Important notes

The instruction for building a PGO release come from @smuzaffar - many, many thanks!

I have added the flag -Wno-missing-profile to avoid errors about files with no coverage.

The flag -Wno-error=array-bounds can be used to work around the error reported above. The resulting binary runs successfully.

fwyzard commented 1 year ago

And here s the configuration used to generate the profiles:

The profile used above was generated with

wget https://github.com/cms-sw/cmssw/files/12524484/config.tar.gz
tar xaf config.tar.gz
cd config
cmsRun training.py

@smuzaffar, let me know if I should make the input events available somewhere.