Non-reproducibility in JetMET/{Jet,MET}Validation histograms in phase2 workflows

makortel commented 1 year ago

It seems that we have non-reproducibility in some JetMET/{Jet,MET}Validation histograms that are visible in PR tests. So far seen (at least) in

cmsbuild commented 1 year ago

A new Issue was created by @makortel Matti Kortelainen.

@Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

makortel commented 1 year ago

assign dqm

FYI @cms-sw/jetmet-pog-l2

cmsbuild commented 1 year ago

New categories assigned: dqm

@jfernan2,@ahmad3213,@micsucmed,@rvenditti,@emanueleusai,@syuvivida,@pmandrik you have been requested to review this Pull request/Issue and eventually sign? Thanks

smuzaffar commented 1 year ago

These were first seen in https://github.com/cms-sw/cmssw/pull/39699 (see logs] , so could #39699 change is responsible for this?

makortel commented 1 year ago

An earlier test in https://github.com/cms-sw/cmssw/pull/39699#issuecomment-1278855112 reports only 6 DQM histograms with comparison differences, which would suggest that #39699 would not be responsible for the differences (or it at least the answer is less clear).

On the other hand, the occurrence of these differences seem to be random and not very frequent, so it could be that the PR responsible for this has clean comparisons in its tests.

perrotta commented 1 year ago

This got somehow fixed, since the same histos now reproduce nicely. Can this get closed?

makortel commented 1 year ago

Sure

makortel commented 1 year ago

Seems that we are again seeing these

makortel commented 1 year ago

Documenting here https://github.com/cms-sw/cmssw/pull/41019#issuecomment-1463003532 workflow 20834.0 shows differences in

JetMET/METValidation/slimmedMETsPuppi/{METResolution_GenMETTrue_InMETBins, METUnc_ElectronEnDown, METUnc_ElectronEnUp}
JetMET/METValidation/PfMetT0pcT1/METResolution_GenMETTrue_InMETBins
JetMET/METValidation/PfMetT1/METResolution_GenMETTrue_InMETBins
JetMET/METValidation/pfMet/METResolution_GenMETTrue_InMETBins
JetMET/METValidation/pfMetT0pc/METResolution_GenMETTrue_InMETBins
JetMET/METValidation/slimmedMETs/METResolution_GenMETTrue_InMETBins
JetMET/Jet/CleanedslimmedJetsAK8/Pt_profile
ParticleFlow/slimmedMETValidation/CompWithPFMET/{profileRMS_delta_set_VS_set_,profile_delta_set_VS_set_}

Also 20834.75, 20834.76, 20896.0, 20900.0, 21034.999, and 23234.0 show differences

(also https://github.com/cms-sw/cmssw/pull/41016#issuecomment-1462972599 can be related)

makortel commented 1 year ago

Here https://github.com/cms-sw/cmssw/pull/41328#issuecomment-1509411454 are also many differences in many JetMET folders in workflows 23234.0, 23634.0, 23634.911, 23696.0, 23700.0, 23834.999.

Curiously the baseline was run on Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz (Broadwell) and the PR test on Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz (Cascade Lake).

makortel commented 1 year ago

assign upgrade

cmsbuild commented 1 year ago

New categories assigned: upgrade

@AdrianoDee,@srimanob you have been requested to review this Pull request/Issue and eventually sign? Thanks

makortel commented 1 year ago

assign reconstruction, simulation

cmsbuild commented 1 year ago

New categories assigned: reconstruction,simulation

@mdhildreth,@mandrenguyen,@clacaputo,@civanch you have been requested to review this Pull request/Issue and eventually sign? Thanks

makortel commented 1 year ago

I looked a bit more details of the differences in https://github.com/cms-sw/cmssw/pull/42123#issuecomment-1611881110. I noticed in this case

the baseline tests were run on Intel(R) Xeon(R) CPU E5-2683 v4 (Broadwell)
the PR tests were run on Intel(R) Xeon(R) Gold 5218 (Cascade Lake)

Could some TensorFlow / ONNX ML model is somehow sensitive to the use of AVX-512 instructions? (we have seen similar behavior with some ML models before)

makortel commented 1 year ago

In https://github.com/cms-sw/cmssw/pull/42507#issuecomment-1670824559

the baseline tests were run on Intel(R) Xeon(R) Gold 5218 (Cascade Lake)
the PR tests were run on Intel(R) Xeon(R) CPU E5-2683 v4 (Broadwell)

missirol commented 1 year ago

https://github.com/cms-sw/cmssw/pull/42540#issuecomment-1674376843 and https://github.com/cms-sw/cmssw/pull/42534#issuecomment-1673646530 are probably examples of this issue (I do not know how to find the specs of the machines used for the tests).

mmusich commented 1 year ago

(I do not know how to find the specs of the machines used for the tests).

I would really be interested to know how to do that as well!

makortel commented 1 year ago

(I do not know how to find the specs of the machines used for the tests).

I would really be interested to know how to do that as well!

You can look at the end of the framework job report XML file (JobReport<N>.xml) of e.g. any step of any matrix workflow (as they are all run on the same machine, it doesn't matter which one). There is something along

<PerformanceSummary Metric="SystemCPU">
  <Metric Name="CPUModels" Value="Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz"/>

that tells the CPU model.

In https://github.com/cms-sw/cmssw/pull/42540#issuecomment-1674376843

PR tests were run on Intel(R) Xeon(R) Gold 5218 CPU (Cascade Lake)
baseline tests were run on Intel(R) Xeon(R) CPU E5-2683 v4 (Broadwell)

In https://github.com/cms-sw/cmssw/pull/42534#issuecomment-1673646530

PR tests were run on Intel(R) Xeon(R) Silver 4216 CPU (Cascade Lake)
baseline tests were run on Intel(R) Xeon(R) CPU E5-2683 v4 (Broadwell)

missirol commented 1 year ago

Another example in https://github.com/cms-sw/cmssw/pull/42554#issuecomment-1675809497.

PR tests were run on Intel(R) Xeon(R) Gold 5218 CPU (Cascade Lake).
baseline tests were run on Intel(R) Xeon(R) CPU E5-2683 v4 (Broadwell).

missirol commented 1 year ago

Another example in https://github.com/cms-sw/cmssw/pull/42512#issuecomment-1678611490.

PR tests were run on Intel(R) Xeon(R) CPU E5-2683 v4 (Broadwell).
baseline tests were run on Intel(R) Xeon(R) Silver 4216 CPU (Cascade Lake).

missirol commented 1 year ago

Another example in https://github.com/cms-sw/cmssw/pull/42610#issuecomment-1685128180 :

the baseline tests were run on Intel(R) Xeon(R) CPU E5-2683 v4 (Broadwell)
the PR tests were run on Intel(R) Xeon(R) Silver 4216 CPU (Cascade lake)

missirol commented 1 year ago

Another example in https://github.com/cms-sw/cmssw/pull/42707#issuecomment-1703882846 :

the baseline tests were run on Intel(R) Xeon(R) Silver 4216 CPU (Cascade lake)
the PR tests were run on Intel(R) Xeon(R) CPU E5-2683 v4 (Broadwell)

cms-sw / cmssw

Non-reproducibility in JetMET/{Jet,MET}Validation histograms in phase2 workflows #39754