Closed fwyzard closed 3 years ago
Reference release CMSSW_11_2_0_pre10 at 6c149b2963ee Development branch cms-patatrack/CMSSW_11_2_X_Patatrack at 6a192beda960 Testing branch cms-patatrack/CMSSW_11_2_X_Patatrack at 6a192beda960 with PRs:
nvprof
/nvvp
profilescuda-memcheck --tool initcheck
(report, log) did not find any errorscuda-memcheck --tool memcheck --leak-check full --report-api-errors all
(report, log) did not find any errorscuda-memcheck --tool synccheck
(report, log) did not find any errorscuda-memcheck --tool initcheck
(report, log) did not find any errorscuda-memcheck --tool memcheck --leak-check full --report-api-errors all
(report, log) did not find any errorscuda-memcheck --tool synccheck
(report, log) did not find any errorscuda-memcheck --tool initcheck
(report, log) did not find any errorscuda-memcheck --tool memcheck --leak-check full --report-api-errors all
(report, log) did not find any errorscuda-memcheck --tool synccheck
(report, log) found no CUDA-MEMCHECK resultscuda-memcheck --tool initcheck
(report, log) did not find any errorscuda-memcheck --tool memcheck --leak-check full --report-api-errors all
(report, log) did not find any errorscuda-memcheck --tool synccheck
(report, log) did not find any errorscuda-memcheck --tool initcheck
(report, log) did not find any errorscuda-memcheck --tool memcheck --leak-check full --report-api-errors all
(report, log) did not find any errorscuda-memcheck --tool synccheck
(report, log) did not find any errorscuda-memcheck --tool initcheck
(report, log) did not find any errorscuda-memcheck --tool memcheck --leak-check full --report-api-errors all
(report, log) did not find any errorscuda-memcheck --tool synccheck
(report, log) did not find any errorscuda-memcheck --tool initcheck
(report, log) did not find any errorscuda-memcheck --tool memcheck --leak-check full --report-api-errors all
(report, log) did not find any errorscuda-memcheck --tool synccheck
(report, log) found no CUDA-MEMCHECK resultscuda-memcheck --tool initcheck
(report, log) found 0 errorscuda-memcheck --tool memcheck --leak-check full --report-api-errors all
(report, log) found 0 errorscuda-memcheck --tool synccheck
(report, log) found 0 errorscuda-memcheck --tool initcheck
(report, log) did not find any errorscuda-memcheck --tool memcheck --leak-check full --report-api-errors all
(report, log) did not find any errorscuda-memcheck --tool synccheck
(report, log) did not find any errorscuda-memcheck --tool initcheck
(report, log) did not find any errorscuda-memcheck --tool memcheck --leak-check full --report-api-errors all
(report, log) did not find any errorscuda-memcheck --tool synccheck
(report, log) did not find any errorscuda-memcheck --tool initcheck
(report, log) did not find any errorscuda-memcheck --tool memcheck --leak-check full --report-api-errors all
(report, log) did not find any errorscuda-memcheck --tool synccheck
(report, log) found no CUDA-MEMCHECK resultscuda-memcheck --tool initcheck
(report, log) did not find any errorscuda-memcheck --tool memcheck --leak-check full --report-api-errors all
(report, log) did not find any errorscuda-memcheck --tool synccheck
(report, log) did not find any errorscuda-memcheck --tool initcheck
(report, log) did not find any errorscuda-memcheck --tool memcheck --leak-check full --report-api-errors all
(report, log) did not find any errorscuda-memcheck --tool synccheck
(report, log) did not find any errorscuda-memcheck --tool initcheck
(report, log) did not find any errorscuda-memcheck --tool memcheck --leak-check full --report-api-errors all
(report, log) did not find any errorscuda-memcheck --tool synccheck
(report, log) did not find any errorscuda-memcheck --tool initcheck
(report, log) did not find any errorscuda-memcheck --tool memcheck --leak-check full --report-api-errors all
(report, log) did not find any errorscuda-memcheck --tool synccheck
(report, log) found no CUDA-MEMCHECK resultscuda-memcheck --tool initcheck
(report, log) did not find any errorscuda-memcheck --tool memcheck --leak-check full --report-api-errors all
(report, log) did not find any errorscuda-memcheck --tool synccheck
(report, log) did not find any errorscuda-memcheck --tool initcheck
(report, log) did not find any errorscuda-memcheck --tool memcheck --leak-check full --report-api-errors all
(report, log) did not find any errorscuda-memcheck --tool synccheck
(report, log) did not find any errorscuda-memcheck --tool initcheck
(report, log) did not find any errorscuda-memcheck --tool memcheck --leak-check full --report-api-errors all
(report, log) did not find any errorscuda-memcheck --tool synccheck
(report, log) did not find any errorscuda-memcheck --tool initcheck
(report, log) did not find any errorscuda-memcheck --tool memcheck --leak-check full --report-api-errors all
(report, log) did not find any errorscuda-memcheck --tool synccheck
(report, log) found no CUDA-MEMCHECK resultscuda-memcheck --tool initcheck
(report, log) did not find any errorscuda-memcheck --tool memcheck --leak-check full --report-api-errors all
(report, log) did not find any errorscuda-memcheck --tool synccheck
(report, log) did not find any errorscuda-memcheck --tool initcheck
did not runcuda-memcheck --tool memcheck --leak-check full --report-api-errors all
did not runcuda-memcheck --tool synccheck
did not runcuda-memcheck --tool initcheck
(report, log) did not find any errorscuda-memcheck --tool memcheck --leak-check full --report-api-errors all
(report, log) did not find any errorscuda-memcheck --tool synccheck
(report, log) did not find any errorscuda-memcheck --tool initcheck
(report, log) did not find any errorscuda-memcheck --tool memcheck --leak-check full --report-api-errors all
(report, log) did not find any errorscuda-memcheck --tool synccheck
(report, log) found no CUDA-MEMCHECK resultscuda-memcheck --tool initcheck
(report, log) did not find any errorscuda-memcheck --tool memcheck --leak-check full --report-api-errors all
(report, log) did not find any errorscuda-memcheck --tool synccheck
(report, log) did not find any errorsThe full log is available at https://patatrack.web.cern.ch/patatrack/validation/pulls/4a188869c781252b40b258ed9e5e9128eddef122/log .
Hi @fwyzard are there any special permissions needed to see the validation plots? I get 404 errors or "not found".
No - but I need to trigger publishing them by hand...
In my test a comparison of uncalibrated RecHits shows agreement between CPU and GPU: CPU: GPU:
Comparing the RecHits shows differences. More RecHits are found for the CPU version (This includes PR #592 so the same RecHit producer should run for the CPU and GPU WFs): CPU: GPU:
The trigger report for the GPU configuration is not what I was expecting though. It seems as if the CPU module also runs for the uncalibrated RecHits:
TrigReport 200 100 100 0 0 ecalMultiFitUncalibRecHit
TrigReport 200 100 100 0 0 ecalMultiFitUncalibRecHitGPU
TrigReport 200 100 100 0 0 ecalMultiFitUncalibRecHitSoA
So perhaps the agreement in the post above actually comes from comparing CPU outputs with CPU outputs.
For the RecHits the GPU modules do not process events though as expected.
TrigReport 200 100 100 0 0 ecalRecHit
TrigReport 0 0 0 0 0 ecalRecHitGPU
TrigReport 0 0 0 0 0 ecalRecHitSoA
Looking closer at the configuration it seems that ecalMultiFitUncalibRecHit
is a conversion module from GPU to CPU. This seems to be OK then.
Since the RecHitProducer is the same for CPU and GPU, the differences in the RecHit energy plot probably come from the inputs to the module. Looking a bit closer at the UncalibRecHits there are some variables that do show differences between the CPU and the GPU version. Agreement is seen for amplitude, pedestal, while differences are seen for amplitudeError (0 for GPU), jitter (0 for GPU), chi2 (very small), OOTamplitudes, OOTchi2, flags, and aux (0 for GPU). Which of these variables are used by the RecHitProducer?
Hi @fwyzard what does the error in cuda-memcheck --tool synccheck
for the .512 WFs mean? Some issue with the synchronisation?
hi @thomreis sorry about that - you can disregard the synccheck
errors, I believe that they are false positives
Agreement is seen for amplitude, pedestal, while differences are seen for amplitudeError (0 for GPU), jitter (0 for GPU), chi2 (very small), OOTamplitudes, OOTchi2, flags, and aux (0 for GPU). Which of these variables are used by the RecHitProducer?
No idea ...
The ECAL calibrated rechits produced on the GPU are not yet correct. Disable using them in the
gpu
workflows until they are working and validated.