Open nothingface0 opened 1 week ago
cms-bot internal usage
A new Issue was created by @nothingface0.
@Dr15Jones, @antoniovilela, @makortel, @mandrenguyen, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks.
cms-bot commands are listed here
assign dqm
New categories assigned: dqm
@antoniovagnerini,@rseidita you have been requested to review this Pull request/Issue and eventually sign? Thanks
While debugging, we faced another issue and had to restart the dev DQMGUI, which led to another issue appearing. We are investigating.
Cms-talk post here
@nothingface0 , any idea why only the unit test fail while the dqm bin-by-bin comparison works [a]. dqm bin-bin comparison also uses visDQMUpload.py to upload many root files to https://cmsweb.cern.ch/dqm/dev
https://github.com/cms-sw/cmssw/blob/master/DQMServices/FileIO/scripts/compareDQMOutput.py#L73-L99
[a]
Uploading output:Uploading output:
visDQMUpload.py https://cmsweb.cern.ch/dqm/dev /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/cms-bot/dqm-comparison/dqmComparisonOutput/pr/DQM_V0001_R000000001__RelVal_wf10224_0_pr__CMSSW_14_2_X-PRcmssw_46662-65580__DQMIO.root
visDQMUpload.py https://cmsweb.cern.ch/dqm/dev /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/cms-bot/dqm-comparison/dqmComparisonOutput/pr/DQM_V0001_R000000001__RelVal_wf13034_0_pr__CMSSW_14_2_X-PRcmssw_46662-65580__DQMIO.root
Uploading output:
visDQMUpload.py https://cmsweb.cern.ch/dqm/dev /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/cms-bot/dqm-comparison/dqmComparisonOutput/pr/DQM_V0001_R000165121__wf1000_0_pr__CMSSW_14_2_X-PRcmssw_46662-65580__DQMIO.root
Uploading output:Uploading output:
@smuzaffar Regarding the bin-by-bin comparison, from what I understand it's done locally, where the test is running, and then the results are uploaded. In the script you link, there's no validation that the upload itself worked, e.g. by checking the GUI after the upload finished: it's just comparing and uploading.
the unit test is failing at the time of upload [a] in visDQMUpload.py ... right ? And this upload is working for DQM bin-by-bin otherwise we should have seen visDQMUpload.py
failing too for dqm bin-bin .... right?
[a]
+ visDQMUpload.py https://cmsweb.cern.ch/dqm/dev DQM_V0001_R000000001__Harvesting__DQMTests202411134029559212184__DQMIO.root
DQM_V0001_R000000001__Harvesting__DQMTests202411134029559212184__DQMIO.root
Using SSL private key /data/cmsbld/jenkins/workspace/ib-run-qa/x509up_u501
Using SSL public key /data/cmsbld/jenkins/workspace/ib-run-qa/x509up_u501
ERROR HTTP Error 500: Internal Server Error
Status code: None
Message: None
Detail: None
Taking this failed test as an example, judging from the logs I found in DQMGUI and the test's logs:
%H
in the date
command, run by the test).DQM_V0001_R000000001__RelVal_wf2500_201_base__CMSSW_14_2_X-PRcmssw_46659-65571__DQMIO.root
). DQM_V0001_R000380306__wf2024_202001_base__CMSSW_14_2_X-PRcmssw_46666-65573__DQMIO.root
. The file itself is small (27K) so I'm thinking something must have broken there. Takeaway points:
Another instance of the failure here.
On the other hand, for this successful test:
date
command used to name the test file returns). How about we disable this test for PRs/IBs. We run it as a special test for each IB ( just like we run tests for crab and hlt) and there we can increase the wait time to few hours (we can run it on lxplus so it will not waste our build resources). If it does not get the processing after let say 6 hours then we can mark it failed?
How about we disable this test for PRs/IBs. We run it as a special test for each IB ( just like we run tests for crab and hlt) and there we can increase the wait time to few hours (we can run it on lxplus so it will not waste our build resources). If it does not get the processing after let say 6 hours then we can mark it failed?
I didn't know there was such an option, sounds good to me! Let me know if any modifications are required for the test.
The recently added
TestDQMGUIUpload
(#46551) has shown to fail even after 10 minutes of waiting, for recent PR tests and an IB:After checking the logs of the target DQMGUI, the first impression I get is that during periods of heavy dev DQMGUI activity (upload of tier0 replays, PR root files), it looks like it might take a significant amount of time for the file uploaded by the test to be properly registered, meaning that the test fails. If this is the only problem of the test, we could increase the max waiting time.
Unfortunately, I forgot to add
%H
in the timestamp that is added to the file, so I don't know exactly how much time it takes the DQMGUI to discover each uploaded file, since I only know what time it arrived and was imported, but not when the test started.I will keep this issue updated as I investigate from the DQM side.