cms-sw / cmssw

CMS Offline Software
http://cms-sw.github.io/
Apache License 2.0
1.07k stars 4.28k forks source link

add/update highBetaStar 2018 workflows with a physics data run #24795

Closed slava77 closed 5 years ago

slava77 commented 5 years ago

I would like to ask for an update or addition of workflows with 2018 highBeta star configuration to include a physics data run. Currently available workflows 136.8561 and 136.8562 use runs from April 2018 (314890 and 314276, respectively), which did not have a final PPS detector setup.

From the discussion in #24683, it looks like 319176 or 319270 can be used.

@nminafra @jan-kaspar @forthommel

cmsbuild commented 5 years ago

A new Issue was created by @slava77 Slava Krutelyov.

@davidlange6, @Dr15Jones, @smuzaffar, @fabiocos, @kpedro88 can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

slava77 commented 5 years ago

assign pdmv

cmsbuild commented 5 years ago

New categories assigned: pdmv

@prebello,@pgunnell,@zhenhu you have been requested to review this Pull request/Issue and eventually sign? Thanks

slava77 commented 5 years ago

@prebello,@pgunnell,@zhenhu please let me know if this request is already under consideration by PDMV. Thank you.

prebello commented 5 years ago

@slava77 Please let us know more details about your request. Are the changes of #24683 already merged? Otherwise the relvals will not assume PPS detector setup as requested. Is it expected for which release and with which GT? Thank you @fabiocos

slava77 commented 5 years ago

Are the changes of #24683 already merged?

I wanted a relval matrix workflow to be created to be able to test #24683 . We can ask AlCa if a better GT is useful, but my hope is that the same job configuration as in 136.8561 and/or 136.8562 can be used (including the same GT).

@lpernie

slava77 commented 5 years ago

@prebello was there any progress on this issue (perhaps outside this thread). We have quite a bit of code coming for CTPPS updates and there is apparently no physics quality data in our matrix tests.

@nminafra @jan-kaspar @forthommel In the PR description I picked up possible runs 319176 or 319270. Are they good or do we have something better?

In the meantime, are the runs 314890 and 314276 (Commissioning2018 era covered in tests ) still OK, or were they missing significant parts of the PPS detectors?

prebello commented 5 years ago

@slava77 I didn't receive any feedback or confirmation about it. PdmV needs basically: In which release is it aimed to work? which data inputs (with and without PPS detector maybe; run numbers for both? do you need both scenarios?) should be used for 136.8561 and 136.8562?

@lpernie do you confirm the same (autoCond from the release pointed above) GT can be used?

prebello commented 5 years ago

@slava77 the actual runs and Lumis in the wfs are RunhBStarTk={314890: [[500, 700]]} RunhBStarRP={314276: [[1, 200]]} so apart from runs, the proper Lumi range information for your purposes is useful '--conditions':'auto:run2_data_promptlike' (that depends on the release adopted)

slava77 commented 5 years ago

On 1/16/19 7:41 AM, Patricia Rebello Teles wrote:

@slava77 the actual runs and Lumis in the wfs are RunhBStarTk={314890: [[500, 700]]} RunhBStarRP={314276: [[1, 200]]} so apart from runs, the proper Lumi range information for your purposes is useful '--conditions':'auto:run2_data_promptlike' (that depends on the release adopted)

Sure, please see the original issue text: runs 319176 or 319270 can be used. https://cms-service-dqm.web.cern.ch/cms-service-dqm/CAF/certification/Collisions18/13TeV/PromptReco/Cert_319104-319311_13TeV_PromptReco_SpecialCollisions18_TOTEM_JSON.txt shows that 319176 [1, 1803] are good and 319270": [1, 206] is good. Either an update to the inputs in 136.8562 (e.g. with the longer 319176), or a new workflow would be fine.

The default master branch update should be enough.

Thank you.

slava77 commented 5 years ago

@prebello please let me know if there was progress with this issue. thank you.

I guess for AlCa/GT resolution I can just assign this to alca

slava77 commented 5 years ago

assign alca

cmsbuild commented 5 years ago

New categories assigned: alca

@tocheng,@pohsun,@franzoni you have been requested to review this Pull request/Issue and eventually sign? Thanks

prebello commented 5 years ago

Hi @slava77 no progress so far but I guess we can quickly update the inputs in 136.8562 then. Nevertheless I see that a proper GT (new conditions?) is needed from AlCa side, right? do you want that PdmV makes a PR in master for this update or someone else will do it? It would be available only for merging in 10-5-0-pre2, although we can quickly test it in any release acc proper conditions.

slava77 commented 5 years ago

Hi @slava77 no progress so far but I guess we can quickly update the inputs in 136.8562 then.

Nevertheless I see that a proper GT (new conditions?) is needed from AlCa side, right?

I don't really know. perhaps a prompt-like GT is good enough.

do you want that PdmV makes a PR in master for this update or someone else will do it? It would be available only for merging in 10-5-0-pre2, although we can quickly test it in any release acc proper conditions.

This would be nice. Thank you.

slava77 commented 5 years ago

@prebello please clarify on the status of this request. This is somewhat blocking proper integration of the PPS detector developments.

tocheng commented 5 years ago

@slava77 @prebello Hello, I think prompt like GT in the autoCond should be fine.

prebello commented 5 years ago

Hi @slava77 doing it now. Sorry busy times in PdmV last months. so the workflows[136.8562] = ['',['RunZeroBias1_hBStarRP','HLTDR2_2018_hBStar','RECODR\ 2_2018reHLT_Prompt_hBStar','HARVEST2018_hBStar']] will be injected with autoCond GT in 10-5-0 (ok @tocheng ?) RunhBStarRP={319270: [[1, 206]]} #for Roman Pot system
steps['RunZeroBias1_hBStarRP']={'INPUT':InputInfo(dataSet='/ZeroBias1/Commissioning2018-v1/RAW',label='zbhBSRP',events=100000,location='STD', ls=RunhBStarRP)}

slava77 commented 5 years ago

On 3/12/19 11:23 AM, Patricia Rebello Teles wrote:

Hi @slava77 doing it now. Sorry busy times in PdmV last months.

Thank you.

prebello commented 5 years ago

@slava77 there is no dataset=/ZeroBias/Commissioning2018-v1/RAW for run 319270 could you please confirm the dataset to be used? TOTEM ones maybe? dataset dataset=/TOTEM/*/RAW run=319270

slava77 commented 5 years ago

@slava77 there is no dataset=/ZeroBias*/Commissioning2018-v1/RAW for run 319270

the acquisition era for these runs was Run2018B There is a /ZeroBiasTOTEM1/Run2018B-v1/RAW We could also take a /TOTEM10/Run2018B-v1/RAW

prebello commented 5 years ago

hi @slava77 relval injected. https://dmytro.web.cern.ch/dmytro/cmsprodmon/requests.php?campaign=CMSSW_10_5_0__TOTEM_highBstar-1552438873

slava77 commented 5 years ago

hi @slava77 relval injected. https://dmytro.web.cern.ch/dmytro/cmsprodmon/requests.php?campaign=CMSSW_10_5_0__TOTEM_highBstar-1552438873

Nice. Is this a precursor to an update of the matrix workflows (136.8561 and 136.8562) in the CMSSW?

prebello commented 5 years ago

not exactly. I have created a new relval 136.8563. I will do PR in master to make it available. Do you need backport to 10-5-X?

slava77 commented 5 years ago

On 3/13/19 5:43 AM, Patricia Rebello Teles wrote:

not exactly. I have created a new relval 136.8563. I will do PR in master to make it available. Do you need backport to 10-5-X?

A new relval wf is just fine. No need for a backport. Thank you.

slava77 commented 5 years ago

Prompted by the recent issues reported in https://github.com/cms-sw/cmssw/pull/26394#issuecomment-484429896

Looking at CMSSW_10_6_X_2019-04-17-2300 Our workflows with Run2_2018_highBetaStar are still only 136.8561 and 136.8562.

It looks like this issue is still not resolved even though an update was promised already more than a month ago.

@prebello please clarify if we can get an update to a physics data run. Based on the problems reported in https://github.com/cms-sw/cmssw/pull/26394#issuecomment-484429896 it also looks like support for the commissioning data runs used in 136.8561 and 136.8562 is an extra burden and perhaps a replacement of inputs with physics data is a better approach than keeping the commissioning data tests.

@franzoni

prebello commented 5 years ago

@slava77 I have just read your comment. Indeed the promised PR with the updates was lost among my long to-do list items. I apologize for the delay. As you know, now there are 3 types of hBStar wfs. The usual wfs workflows[136.8561] = ['',['RunZeroBias_hBStarTk','HLTDR2_2018_hBStar','RECODR2_2018reHLT_Offline_hBStar','HARVEST2018_hBStar']] running with /ZeroBias/Commissioning2018-v1/RAW workflows[136.8562] = ['',['RunZeroBias1_hBStarRP','HLTDR2_2018_hBStar','RECODR2_2018reHLT_Offline_hBStar','HARVEST2018_hBStar']] running with /ZeroBias1/Commissioning2018-v1/RAW

plus the new one workflows[136.8563] = ['',['RunTOTEM10_hBStarRP','HLTDR2_2018_hBStar','RECODR2_2018reHLT_Prompt_hBStar','HARVEST2018_hBStar']] running with /TOTEM10/Run2018B-v1/RAW

I guess the 136.8563 can stay the same, while 136.8561 and 136.8562 need new inputs.

Therefore, would the physics runs /ZeroBias/Run2018D-v1/RAW and /ZeroBias1/Run2018D-v1/RAW, respectively, be useful for your purposes? When you confirm the inputs, I will make the PR immediately. Let me know if you also need these relvals in the recent 10-6-0-pre4 for testing.

slava77 commented 5 years ago

@prebello thank you for following up.

Therefore, would the physics runs /ZeroBias/Run2018D-v1/RAW and /ZeroBias1/Run2018D-v1/RAW, respectively, be useful for your purposes?

At a quick glance, I do not see anything from 2018D that can be processed with the highBeta configuration. This configuration is valid only on low pileup data. I propose to remove the old commissioning run tests.

prebello commented 5 years ago

@slava77
More detailed information to support your decision: the low PU 15 runs in 2018 are listed as 318939 318945 318953 319460 319462 319463 319464 319466 319467 319468 319469 319470 319471 319472 319488 ZeroBias is available as well as few ZeroBiasTOTEM[1-4] ones. What do you think?

slava77 commented 5 years ago

I'm not sure I understood the implications of your message.

I proposed run 319270 in October, when this issue was opened. From the follow up discussion we converged on this used to define 136.8563.

Are you proposing alternatives to removal of 136.8561 and 136.8562? It looks like the runs that you listed, even though taken at lowPU, they were processed in prompt with the standard settings (not highBetaStar config). We should probably just stick to testing of what is used in production.

prebello commented 5 years ago

@slava77 ok so maybe I have misunderstood your proposal, as well as mixed subjects. Never mind. Therefore I will not touch in the 136.8561 and 136.8562 RAW inputs, and will make a PR for 136.8563 only.

slava77 commented 5 years ago

Therefore I will not touch in the 136.8561 and 136.8562 RAW inputs, and will make a PR for 136.8563 only.

please note https://github.com/cms-sw/cmssw/issues/24795#issuecomment-484477819

Based on the problems reported in #26394 (comment) it also looks like support for the commissioning data runs used in 136.8561 and 136.8562 is an extra burden ...

So, we better still do something with 136.8561 and 136.8562. It seems the easiest to just remove them. If they are really needed for some other purpose, the same 319270 can be used.

prebello commented 5 years ago

@slava77 my updates: 1) https://its.cern.ch/jira/browse/CMSTRANSF-63 to make TOTEM10 available for test the wfs in 10-6-X IB before making the PR (we have tested it in 10-5-X before), as well as lock it for future tests 2) I will no remove/touch the old workflows. In the case they are needed then we proceed with the needed changes. As soon as I can do my test, I will make the PR. It should be ready latest by tomorrow.

slava77 commented 5 years ago

@prebello thank you for the updates.

Please note that as mentioned earlier, 136.8562 is currently broken in 10_6_X. It's unclear if there will be a fix to it soon. This and 136.8561 are the only data workfows using Commissioning acquisition period data. Typically there is no production level support for such kind of data. Perhaps by accident, 136.8561 (run 314890) still works .

jan-kaspar commented 5 years ago

Please note that as mentioned earlier, 136.8562 is currently broken in 10_6_X. It's unclear if there will be a fix to it soon.

If you refer to the problem mentioned here: https://github.com/cms-sw/cmssw/pull/26394#issuecomment-484429896, then a fix should be available in few days. We already have an updated optics configuration, which excludes the special high beta runs and thus should prevent this problem. Since this is an update needed for UL re-reco, it should go relatively fast.

prebello commented 5 years ago

@slava77 @jan-kaspar in my tests with the new wf using TOTEM10 RAW, there is fatal exception related to PPS. the same of the mentioned PR. Is it expected, even using physics run (not commissioning)? ----- Begin Fatal Exception 26-Apr-2019 00:29:29 CEST----------------------- An exception of category 'ProtonReconstructionAlgorithm' occurred while [0] Processing Event run: 319270 lumi: 138 event: 202226407 stream: 3 [1] Running path 'Flag_trkPOG_logErrorTooManyClusters' [2] Prefetching for module LogErrorEventFilter/'logErrorTooManyClusters' [3] Prefetching for module LogErrorHarvester/'logErrorHarvester' [4] Calling method for module CTPPSProtonProducer/'ctppsProtons' Exception Message: Optics data not available for RP 1990197248, i.e. subDet=3 arm=0 station=2 rp=4. ----- End Fatal Exception -------------------------------------------------

slava77 commented 5 years ago

On 4/25/19 7:57 PM, Patricia Rebello Teles wrote:

the mentioned PR. Is it expected, even using physics run (not commissioning)?

Interesting. It looks like I was wrong trying to push out the commissioning2018 setup with an argument that this was way too special of a run/test. So, it becomes clear that the TOTEM run in 2018B has the same issue with the PPS payloads. This then somewhat clearly means that it should not be abandoned for the UL needs and we can expect the earlier workflows from commissioning2018 to keep working.

prebello commented 5 years ago

indeed @slava77 @fabiocos , in this case, does it make sense to proceed with the PR of the new wf using TOTEM10 or not? I would suggest to wait for the fix, what do you think?

jan-kaspar commented 5 years ago

@slava77 @jan-kaspar in my tests with the new wf using TOTEM10 RAW, there is fatal exception related to PPS. the same of the mentioned PR. Is it expected, even using physics run (not commissioning)? ----- Begin Fatal Exception 26-Apr-2019 00:29:29 CEST----------------------- An exception of category 'ProtonReconstructionAlgorithm' occurred while [0] Processing Event run: 319270 lumi: 138 event: 202226407 stream: 3 [1] Running path 'Flag_trkPOG_logErrorTooManyClusters' [2] Prefetching for module LogErrorEventFilter/'logErrorTooManyClusters' [3] Prefetching for module LogErrorHarvester/'logErrorHarvester' [4] Calling method for module CTPPSProtonProducer/'ctppsProtons' Exception Message: Optics data not available for RP 1990197248, i.e. subDet=3 arm=0 station=2 rp=4. ----- End Fatal Exception -------------------------------------------------

Run 319270 was a high-beta run and thus this problem is, currently, expected. So far we have only uploaded low-beta optics to DB and, by mistake, the special-run IOVs have not been excluded.

I've just successfully tested an updated optics sqlite file where the special runs are excluded. This file should be part of the AlCa sign-off on next Monday.

fabiocos commented 5 years ago

@jan-kaspar @tocheng @christopheralanwest do I understand correctly that the DB update to solve the present failure in wf 136.8562 should be agreed today?

fabiocos commented 5 years ago

and of course a PR implementing it may be made...

jan-kaspar commented 5 years ago

@jan-kaspar @tocheng @christopheralanwest do I understand correctly that the DB update to solve the present failure in wf 136.8562 should be agreed today?

Yes, this should be the plan. As far as I can see, the only PR needed should be updating the auto GT.

jan-kaspar commented 5 years ago

@jan-kaspar @tocheng @christopheralanwest do I understand correctly that the DB update to solve the present failure in wf 136.8562 should be agreed today?

Yes, this should be the plan. As far as I can see, the only PR needed should be updating the auto GT.

Using optics with this tag: https://cms-conddb.cern.ch/cmsDbBrowser/list/Prod/tags/PPSOpticalFunctions_offline_v1 should solve the problem - the b6339... payload should be null, covering also the run 319270 as discussed earlier.

fabiocos commented 5 years ago

@jan-kaspar @tocheng @christopheralanwest ok, we need a PR with an updated GT including this change

tocheng commented 5 years ago

@jan-kaspar since you named this one offline, we need a copy with name prompt so we can put them to offline and prompt GT separately.

christopheralanwest commented 5 years ago

If I run workflow 136.8562 as follows:

cmsrel CMSSW_10_6_X_2019-05-03-1100
cd CMSSW_10_6_X_2019-05-03-1100/src
cmsenv
runTheMatrix.py -l 136.8562 --command="--custom_conditions=PPSOpticalFunctions_offline_v1,CTPPSOpticsRcd,frontier://FrontierProd/CMS_CONDITIONS" --ibeos

I still get the same errors as those in the latest IB.

The payload hash for run 314276 begins with 2333. I thought that the special runs were supposed to use the payload hash starting with b6339. Could you clarify?

jan-kaspar commented 5 years ago

@jan-kaspar since you named this one offline, we need a copy with name prompt so we can put them to offline and prompt GT separately.

@clemencia @wpcarvalho Can you please help? It is about the latest optics tag PPSOpticalFunctions_offline_v1

jan-kaspar commented 5 years ago

If I run workflow 136.8562 as follows:

cmsrel CMSSW_10_6_X_2019-05-03-1100
cd CMSSW_10_6_X_2019-05-03-1100/src
cmsenv
runTheMatrix.py -l 136.8562 --command="--custom_conditions=PPSOpticalFunctions_offline_v1,CTPPSOpticsRcd,frontier://FrontierProd/CMS_CONDITIONS" --ibeos

I still get the same errors as those in the latest IB.

The payload hash for run 314276 begins with 2333. I thought that the special runs were supposed to use the payload hash starting with b6339. Could you clarify?

Thanks @christopheralanwest , I am going to check...

jan-kaspar commented 5 years ago

314276

If I run workflow 136.8562 as follows:

cmsrel CMSSW_10_6_X_2019-05-03-1100
cd CMSSW_10_6_X_2019-05-03-1100/src
cmsenv
runTheMatrix.py -l 136.8562 --command="--custom_conditions=PPSOpticalFunctions_offline_v1,CTPPSOpticsRcd,frontier://FrontierProd/CMS_CONDITIONS" --ibeos

I still get the same errors as those in the latest IB.

The payload hash for run 314276 begins with 2333. I thought that the special runs were supposed to use the payload hash starting with b6339. Could you clarify?

In a previous message, https://github.com/cms-sw/cmssw/issues/24795#issuecomment-486907816, a different run 319270 was referred. For that run, the (null) payload b6339... should be used. So I am confused which run is actually used.

The 314276 happens to be a special PPS "alignment" (low beta*) run where also the vertical RPs participate. As this is exceptional, the corresponding optical functions are not uploaded to the DB. As far as I know, these special runs are not included in any standard (re-)reco - besides others, the "stable beams" flag is not declared by the LHC and consequently large fraction of CMS is off: https://cmswbm.cern.ch/cmsdb/servlet/RunSummary?RUN=314276&SUBMIT=Submit Is there a special interest in this run for RelVals? If yes, probably the easiest is to associate the special run with "null" optics payload to prevent the reported problem. Let me know what you prefer.

christopheralanwest commented 5 years ago

AlCa cannot approve conditions that crash existing workflows. If the workflow is truly useless, perhaps you could follow up with the person who introduced the workflow to see if it should be removed.

But I think it is more straightforward to generate a new tag that uses the null optics for run 314276.