cms-sw / cmssw

CMS Offline Software
http://cms-sw.github.io/
Apache License 2.0
1.09k stars 4.32k forks source link

refactor: disable DQM pb files production #46662

Closed gabrielmscampos closed 3 days ago

gabrielmscampos commented 1 week ago

PR description:

Since we are disabling the anelasticDQM process in the DQM online machines, we are going to start accumulating PB files there. The simplest solution to avoid accumulating files is to disable the PB file production in all clients, because:

PR validation:

Tested in DQM Playback environment, all plots behaving as expected and PB files are not created anymore.

The original dqmSaver (PB) EDAnalyzer is called from the DQM clients under DQM/Integration/python/clients (results found using grep -rn "dqmSaverPB" *, the BeamMonitor test and following files in EventFilter (results found using grep -rn "DQMFileSaverPB_cfi" *):

Nevertheless, the original EDAnalyzer seems to be ignored in HLTTrigger confdb, given the following:

Any input on this is appreciated, since later on a flag with the same name seems to be written in the auto generated file:

cmsbuild commented 1 week ago

cms-bot internal usage

cmsbuild commented 1 week ago

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-46662/42585

Code check has found code style and quality issues which could be resolved by applying following patch(s)

cmsbuild commented 1 week ago

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-46662/42586

cmsbuild commented 1 week ago

A new Pull Request was created by @gabrielmscampos for master.

It involves the following packages:

@antoniovagnerini, @cmsbuild, @rseidita can you please review it and eventually sign? Thanks. @barvic this is something you requested to watch as well. @antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

gabrielmscampos commented 1 week ago

FYI @mmusich

mmusich commented 1 week ago

@cms-sw/hlt-l2 @fwyzard FYI

mmusich commented 1 week ago

concerning:

Nevertheless, the original EDAnalyzer seems to be ignored in HLTTrigger confdb, given the following: HLTrigger/Configuration/python/Tools/confdb.py#L701

it is ignored in the offline dumps of the HLT menu, but we definitely use it the online HLT menu and keep wanting to do so.

mmusich commented 1 week ago

It is easier to replace only this file (if not used outside DQM) instead of disabling the PB production in each client.

why?

Martin-Grunewald commented 1 week ago

Nevertheless, the original EDAnalyzer seems to be ignored in HLTTrigger confdb, given the following:

https://github.com/cms-sw/cmssw/blob/master/HLTrigger/Configuration/python/Tools/confdb.py#L701

Any input on this is appreciated, since later on a flag with the same name seems to be written in the auto generated file:

https://github.com/cms-sw/cmssw/blob/master/HLTrigger/Configuration/python/Tools/confdb.py#L814

Line 814 simple removes the module instance from the dump for offline!

gabrielmscampos commented 1 week ago

It is easier to replace only this file (if not used outside DQM) instead of disabling the PB production in each client.

why?

However, since the online HLT menu is using it. I guess the only solution is modifying all the files.

mmusich commented 1 week ago

However, since the online HLT menu is using it. I guess the only solution is modifying all the files.

I don't think this PR per se is going to be harmful to the HLT online, provided you address https://github.com/cms-sw/cmssw/pull/46662#discussion_r1836892563..

gabrielmscampos commented 1 week ago

However, since the online HLT menu is using it. I guess the only solution is modifying all the files.

I don't think this PR per se is going to be harmful to the HLT online, provided you address #46662 (comment)..

Got it, I can create a new EDAnalyzer for the DQM clients in order to preserve the current one for HLT.

Thanks!

mmusich commented 1 week ago

We are modifying a single line in the DQMFileSaverPB_cfi instead of 46 files in DQM/Integration/python/clients/ + DQM/BeamMonitor/test/Online_BeamMonitor_file.py.

maybe I am missing something trivial, but where is it referenced in all of these files? It looks to me that all the inheritance comes from things like

https://github.com/cms-sw/cmssw/blob/cea249650467c0724ea7dadb238cef942570c708/DQM/BeamMonitor/test/Online_BeamMonitor_file.py#L75

via

https://github.com/cms-sw/cmssw/blob/cea249650467c0724ea7dadb238cef942570c708/DQM/Integration/python/config/environment_cfi.py#L83

If you want to just change the behavior in the online clients can't you create a new cfi file for DQMFileSaverPBStub and replace it in the environment configuration fragment?

something like:

 from DQMServices.FileIO.DQMFileSaverPBStub_cfi import dqmSaver as dqmSaverPB 
gabrielmscampos commented 1 week ago

We are modifying a single line in the DQMFileSaverPB_cfi instead of 46 files in DQM/Integration/python/clients/ + DQM/BeamMonitor/test/Online_BeamMonitor_file.py.

maybe I am missing something trivial, but where is it referenced in all of these files? It looks to me that all the inheritance comes from things like

https://github.com/cms-sw/cmssw/blob/cea249650467c0724ea7dadb238cef942570c708/DQM/BeamMonitor/test/Online_BeamMonitor_file.py#L75

via

https://github.com/cms-sw/cmssw/blob/cea249650467c0724ea7dadb238cef942570c708/DQM/Integration/python/config/environment_cfi.py#L83

If you want to just change the behavior in the online clients can't you create a new cfi file for DQMFileSaverPBStub and replace it in the environment configuration fragment?

something like:

 from DQMServices.FileIO.DQMFileSaverPBStub_cfi import dqmSaver as dqmSaverPB 

Yes, that is what I'm going to do. Initially I thought in removing the references to dqmSaverPB in all clients, that is why I mentioned all the files. However, I'll update only the environment_cfi and cross-check this is not used elsewhere.

fwyzard commented 1 week ago

hold

cmsbuild commented 1 week ago

Pull request has been put on hold by @fwyzard They need to issue an unhold command to remove the hold state or L1 can unhold it for all

fwyzard commented 1 week ago

I can understand the thought process here, but my initial reaction is that creating a new C++ module that doesn’t serve a real purpose seems counterproductive in the long term, especially when there are simpler ways to handle things, like automating the search and replace in the configuration files. For example, something like this could automate most of the changes:

grep environment_cfi -rw */ -l | xargs sed -e '/dqmSaverPB/s/^/#/' -i

That said, I’ve been trying to think if the new DQMFileSaverPBStub could actually have any other use cases, aside from just acting as a placeholder in the Python configuration? At the moment I can’t think of anything, but if there’s anything I’m missing I’d be happy to hear any examples!

On the bright side, if this leads to a proper configuration for the original DQMFileSaverPB with the addition of a fillDescriptions() method, it could certainly add some value, so that would be a positive outcome either way.

gabrielmscampos commented 1 week ago

I can understand the thought process here, but my initial reaction is that creating a new C++ module that doesn’t serve a real purpose seems counterproductive in the long term, especially when there are simpler ways to handle things, like automating the search and replace in the configuration files. For example, something like this could automate most of the changes:

grep environment_cfi -rw */ -l | xargs sed -e '/dqmSaverPB/s/^/#/' -i

That said, I’ve been trying to think if the new DQMFileSaverPBStub could actually have any other use cases, aside from just acting as a placeholder in the Python configuration? At the moment I can’t think of anything, but if there’s anything I’m missing I’d be happy to hear any examples!

On the bright side, if this leads to a proper configuration for the original DQMFileSaverPB with the addition of a fillDescriptions() method, it could certainly add some value, so that would be a positive outcome either way.

If we comment the dqmSaverPB object in the environment_cfi file, all the other pieces of the code (such as the clients) aren't going to crash? Since, some of them try to update a parameter in this object (example)?

I agree that creating a stub class is counterproductive, though I believe this would be the simplest solution without touching many files.

What is the issue with the current DQMFileSaverPB.fillDescriptions? I don't plan to modify it unless needed.

fwyzard commented 1 week ago

If we comment the dqmSaverPB object in the environment_cfi file, all the other pieces of the code (such as the clients) aren't going to crash? Since, some of them try to update a parameter in this object (example)?

Yes, which is why you should comment out or remove those lines.

Which is the one-line command I suggest does, for all files that include environment_cfi:

grep environment_cfi -rw */ -l | xargs sed -e '/dqmSaverPB/s/^/#/' -i
gabrielmscampos commented 1 week ago

If we comment the dqmSaverPB object in the environment_cfi file, all the other pieces of the code (such as the clients) aren't going to crash? Since, some of them try to update a parameter in this object (example)?

Yes, which is why you should comment out or remove those lines.

Which is the one-line command I suggest does, for all files that include environment_cfi:

grep environment_cfi -rw */ -l | xargs sed -e '/dqmSaverPB/s/^/#/' -i

Ah got it, I misunderstood the command. That was my first idea, although I didn't want to touch so many files. If that is fine, I'll gladly do it.

fwyzard commented 1 week ago

What is the issue with the current DQMFileSaverPB.fillDescriptions? I don't plan to modify it unless needed.

Sorry, my bad.

I assumed it was missing since DQMServices/FileIO/python/DQMFileSaverPB_cfi.py implements the configuration from scratch instead of loading if from the autogenerated cfi file.

Wouldn't it make sense to update it accordingly then ?

gabrielmscampos commented 1 week ago

What is the issue with the current DQMFileSaverPB.fillDescriptions? I don't plan to modify it unless needed.

Sorry, my bad.

I assumed it was missing since DQMServices/FileIO/python/DQMFileSaverPB_cfi.py implements the configuration from scratch instead of loading if from the autogenerated cfi file.

Wouldn't it make sense to update it accordingly then ?

It does, but I'd say this is a modification for a different PR.

cmsbuild commented 1 week ago

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-46662/42601

cmsbuild commented 1 week ago

Pull request #46662 was updated. @antoniovagnerini, @cmsbuild, @rseidita can you please check and sign again.

fwyzard commented 1 week ago

unhold

antoniovagnerini commented 1 week ago

please test

cmsbuild commented 1 week ago

-1

Failed Tests: UnitTests Size: This PR adds an extra 84KB to repository Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0cf9c0/42758/summary.html COMMIT: a32c7d3b872fffff6408350b35dd7fea0d7f331d CMSSW: CMSSW_14_2_X_2024-11-11-2300/el8_amd64_gcc12 User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/46662/42758/install.sh to create a dev area with all the needed externals and cmssw changes.

Unit Tests

I found 1 errors in the following unit tests:

---> test TestDQMGUIUpload had ERRORS

Comparison Summary

Summary:

rseidita commented 1 week ago

@cmsbuild please test

cmsbuild commented 1 week ago

-1

Failed Tests: UnitTests RelVals Size: This PR adds an extra 12KB to repository Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0cf9c0/42768/summary.html COMMIT: a32c7d3b872fffff6408350b35dd7fea0d7f331d CMSSW: CMSSW_14_2_X_2024-11-12-1100/el8_amd64_gcc12 User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/46662/42768/install.sh to create a dev area with all the needed externals and cmssw changes.

Unit Tests

I found 1 errors in the following unit tests:

---> test TestDQMGUIUpload had ERRORS

RelVals

antoniovagnerini commented 1 week ago

please test

cmsbuild commented 1 week ago

-1

Failed Tests: UnitTests Size: This PR adds an extra 12KB to repository Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0cf9c0/42835/summary.html COMMIT: a32c7d3b872fffff6408350b35dd7fea0d7f331d CMSSW: CMSSW_14_2_X_2024-11-13-2300/el8_amd64_gcc12 User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/46662/42835/install.sh to create a dev area with all the needed externals and cmssw changes.

Unit Tests

I found 2 errors in the following unit tests:

---> test TestDQMServicesDemo had ERRORS
---> test TestDQMGUIUpload had ERRORS

Comparison Summary

Summary:

nothingface0 commented 1 week ago

@smuzaffar The TestDQMServicesDemo seems to fail due to a missing (automatically generated?) cfi file:

----- Begin Fatal Exception 14-Nov-2024 10:49:09 CET-----------------------
An exception of category 'ConfigFileReadError' occurred while
   [0] Processing the python configuration file named /data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_14_2_X_2024-11-13-2300/src/DQMServices/Demo/test/run_analyzers_cfg.py
Exception Message:
 unknown python problem occurred.
ModuleNotFoundError: No module named 'DQMServices.Demo.test_cfi'

At:
  /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02863/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_14_2_X_2024-11-13-2300/src/FWCore/ParameterSet/python/Config.py(762): load
  /data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_14_2_X_2024-11-13-2300/src/DQMServices/Demo/test/run_analyzers_cfg.py(7): <module>

----- End Fatal Exception -------------------------------------------------

Are we missing something?

smuzaffar commented 1 week ago

please test

looks like latest bot change https://github.com/cms-sw/cms-bot/pull/2361 broke the PR testing

cmsbuild commented 1 week ago

-1

Failed Tests: UnitTests Size: This PR adds an extra 12KB to repository Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0cf9c0/42846/summary.html COMMIT: a32c7d3b872fffff6408350b35dd7fea0d7f331d CMSSW: CMSSW_14_2_X_2024-11-13-2300/el8_amd64_gcc12 User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/46662/42846/install.sh to create a dev area with all the needed externals and cmssw changes.

Unit Tests

I found 1 errors in the following unit tests:

---> test TestDQMGUIUpload had ERRORS

Comparison Summary

Summary:

antoniovagnerini commented 4 days ago

please test

cmsbuild commented 4 days ago

+1

Size: This PR adds an extra 12KB to repository Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0cf9c0/42921/summary.html COMMIT: a32c7d3b872fffff6408350b35dd7fea0d7f331d CMSSW: CMSSW_14_2_X_2024-11-17-2300/el8_amd64_gcc12 User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/46662/42921/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

antoniovagnerini commented 4 days ago

Spurious errors in DQM bin-by-bin comparison in Tracker Phase 2 WS seem to have been introduced by https://github.com/cms-sw/cmssw/pull/46717, which was merged 5 h ago , see https://cmssdt.cern.ch/SDT/jenkins-artifacts/baseLineComparisons/CMSSW_14_2_X_2024-11-17-0000+6d35e2/65692/29634.0_TTbar_14TeV+Run4D110/

antoniovagnerini commented 3 days ago

+1

cmsbuild commented 3 days ago

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @rappoccio, @mandrenguyen, @antoniovilela, @sextonkennedy (and backports should be raised in the release meeting by the corresponding L2)

mandrenguyen commented 3 days ago

+1