cms-sw / cmssw

CMS Offline Software
http://cms-sw.github.io/
Apache License 2.0
1.06k stars 4.24k forks source link

Non deterministic outputs of photonDRN from SonicTriton #41060

Open perrotta opened 1 year ago

perrotta commented 1 year ago

It was realized while reviewing https://github.com/cms-data/RecoEgamma-EgammaPhotonProducers/pull/3 that there was some non reproducible result in the userFloats of patPhotons produced in the wf 10805.31, SingleGammaPt35+2018_photonDRN .

This was confirmed with another comparison of the same wf 10805.31 made for PR #40666, which did not touch the photon weights, and therefore should have produced an identical output in two different runs. Even here, the "randomness" seems in some cases rather significant, i.e. a bit larger than a simple numerical fluctuation in the last digi somewhere: image image

@kpedro88 commented in https://github.com/cms-data/RecoEgamma-EgammaPhotonProducers/pull/3#issuecomment-1458662905: "We also saw this behavior in our private tests last week and realized that there is some randomness inherent to the network itself. The random behavior has been there all along, but https://github.com/cms-sw/cmssw/pull/40814 may actually have been the first time that comparison tests were run for 10805.31 (since it is not part of the short matrix), so it wasn't noticed before. This PR does correctly restore the original weights, but we need to make some more changes to make the network deterministic (this is a work in progress right now)."

This github issue intends exactly to keep track of that "work in progress" to get rid of such a non deterministic behavior of the network.

@ssrothman

cmsbuild commented 1 year ago

A new Issue was created by @perrotta Andrea Perrotta.

@Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

perrotta commented 1 year ago

assign reconstruction

cmsbuild commented 1 year ago

New categories assigned: reconstruction

@mandrenguyen,@clacaputo you have been requested to review this Pull request/Issue and eventually sign? Thanks

ssrothman commented 1 year ago

Hi all, sorry for the slow motion on this; I was otherwise engaged with my qualifying exams. I've now put together what I hope will be a fix to render to the model deterministic, and pushed new torchscript files to my private data branch here. How can I test it to produce the equivalent plot to the one at the top of this issue to check whether the fix was successful?

perrotta commented 1 year ago

@ssrothman thank you for jumping in. For the tests, maybe you can ask @kpedro88 how did they were setup, see https://github.com/cms-sw/cmssw/issues/41060#issue-1625593219 Otherwise you can run the same wf 10805.31 that was mentioned in the issue description, i.e.

runTheMatrix.py -l 10805.31 > & out &
kpedro88 commented 8 months ago

Resolved by #42950