cernopendata / opendata.cern.ch

Source code for the CERN Open Data portal
http://opendata.cern.ch/
GNU General Public License v2.0
656 stars 147 forks source link

CMS: ML files for Tracking GPU studies from CMS Upgrade MC #2459

Closed katilp closed 5 years ago

katilp commented 5 years ago

In connection with #2440, some ML samples will be produced from newly produced Upgrade samples and made available on the portal.

The datasets are produced and available:

/TTToHadronic_TuneCP5_13TeV-powheg-pythia8/RunIIAutumn18DR-PUAvg50IdealConditions_102X_upgrade2018_design_v9-v1/GEN-SIM-RAW /TTToHadronic_TuneCP5_13TeV-powheg-pythia8/RunIIAutumn18DR-PUAvg50IdealConditions_102X_upgrade2018_design_v9-v1/GEN-SIM-RECO /TTToHadronic_TuneCP5_13TeV-powheg-pythia8/RunIIAutumn18DR-PUAvg50IdealConditions_102X_upgrade2018_design_v9-v1/AODSIM

To do:

For contributions, see also https://github.com/cernopendata/opendata.cern.ch/wiki/Contributing-content-to-CERN-Open-Data

katilp commented 5 years ago

The files are now available on eospublic, and can be read from there through xrootd, see the file names and paths in:

AdrianoDee commented 5 years ago

Testing the ML production in the VM.

VM: CMS OpenData 1.4.0 from here CMSSW: 10_2_5 (production release for the dataset) and 10_3_5 GT: 102X_upgrade2018_design_v9 SCRAM_ARCH: slc7_am64_gcc700

NB: the VM seems to be listed as slc6 but when checking the OS I got

schermata 2018-12-13 alle 11 27 40

Also cmsrel seems to be aware of that. The symlink has been set (in the directory I'm running)

ln -sf /cvmfs/cms-opendata-conddb.cern.ch/102X_upgrade2018_design_v9.db 102X_upgrade2018_design_v9.db

schermata 2018-12-13 alle 11 53 09

1) Testing Generic RelVal Production

Since the ML dataset production is run within the RECO step (step3), as first test I tried to run a generic RelVal workflow from the GEN step generating a config with cmsDriver

cmsDriver.py TTbar_13TeV_TuneCUETP8M1_cfi -n 10 --no_exec --eventcontent FEVTDEBUG -s GEN,SIM --datatier GEN-SIM --fileout file:step1.root --conditions 102X_upgrade2018_design_v9

In the resulting config [*] I've changed the GT following the guide

#process.GlobalTag = GlobalTag(process.GlobalTag, '102X_upgrade2018_design_v9.db', '') process.GlobalTag.connect = cms.string('sqlite_file:/cvmfs/cms-opendata-conddb.cern.ch/102X_upgrade2018_design_v9.db') process.GlobalTag.globaltag = '102X_upgrade2018_design_v9' (it seems to have effect since before fixing I had a DB error)

The error I get:

schermata 2018-12-13 alle 11 42 15

Full stack trace attached[*] . The module that has crashed is

Module: Pythia8GeneratorFilter:generator (crashed)

2) Testing RECO step on GEN-SIM-RAW files from the dataset

Basically I tried to run step3 config on one of the GEN-SIM-RAW dataset

_/TTToHadronic_TuneCP5_13TeV-powheg-pythia8/RunIIAutumn18DR-PUAvg50IdealConditions_102X_upgrade2018_designv9-v1/GEN-SIM-RAW

accessed by eospublic

eosfile = 'root://eospublic.cern.ch//eos/opendata/cms/MonteCarloUpgrade/RunIIAutumn18DR/TTToHadronic_TuneCP5_13TeV-powheg-pythia8/GEN-SIM-RAW/PUAvg50IdealConditions_102X_upgrade2018_design_v9-v1/110000/09D95BE7-23C8-5542-9FA4-102E3FEDB7A2.root'

again the config is one of the standard RelVal one generated with cmsDriver [*]

cmsDrivery.py step3 --conditions auto:phase1_2018_realistic -n -1 --eventcontent RECOSIM,MINIAODSIM,DQM --runUnscheduled -s RAW2DIGI,L1Reco,RECO,RECOSIM,EI,PAT,VALIDATION --datatier GEN-SIM-RECO,MINIAODSIM,DQMIO --filein file:step2.root --fileout file:step3.root

where the GT has been fixed as above and the input file too.

schermata 2018-12-13 alle 11 51 57

I haven't tested the ML production config yet since it's based on the step3 config.

[*] https://cernbox.cern.ch/index.php/s/7oDx6LoNufndOwz

katilp commented 5 years ago

Thanks @AdrianoDee : note that the slc6 shell is available from the desktop icon (not from the icon at the bottom of the screen), and it is a singularity container with slc6. The native shell is indeed slc7. This is on purpose (although we have to document it better!!) and all CMSSW specific command should be done on the slc6 shell. Sorry for the confusion.

AdrianoDee commented 5 years ago

Thanks @katilp, I'll check with slc6 and let you know soon

AdrianoDee commented 5 years ago

I just tested withe the slc6 within the CMS Shell (with SCRAM_ARC=slc6_am64_gcc700 ad CMSSW_10_2_5) and basically I'm getting the same errors.

katilp commented 5 years ago

OK, thanks, then it could be an error in the GT export. I understand that this runs with the same CMSSW and GT on lxplus. @ggovi could you please have a look (the error log in the cernbox link above)? Thanks!

ggovi commented 5 years ago

@katilp my check: cd /cvmfs/cms-opendata-conddb.cern.ch conddb --db 102X_upgrade2018_design_v9.db list 102X_upgrade2018_design_v9 shows that the IdealGeometryRecord is there, mapped with tag TKRECO_Geometry_92YV3 conddb --db 102X_upgrade2018_design_v9.db list TKRECO_Geometry_92YV3 shows the same iov content than the tag in production dB... Looks everything in place - can't help much unfortunately...

ggovi commented 5 years ago

[edited the above text to remove the bad result from a cut and paste...]

AdrianoDee commented 5 years ago

For completeness I just tested point 1) on lxplus and it's all smooth. For GT:

process.genstepfilter.triggerConditions=cms.vstring("generation_step") from Configuration.AlCa.GlobalTag import GlobalTag process.GlobalTag = GlobalTag(process.GlobalTag, '102X_upgrade2018_design_v9', '')

The result:

schermata 2018-12-13 alle 16 07 18
katilp commented 5 years ago

OK, thanks. Edit, indeed, I confirm, for the standard GEN process, I get as well:

CMS Shell > cmsrel CMSSW_10_2_5
WARNING: Release CMSSW_10_2_5 is not available for architecture slc6_amd64_gcc472.
         Developer's area is created for available architecture slc6_amd64_gcc700.
CMS Shell > cd CMSSW_10_2_5/src
CMS Shell > cmsenv
CMS Shell > ln -sf /cvmfs/cms-opendata-conddb.cern.ch/102X_upgrade2018_design_v9.db 102X_upgrade2018_design_v9.db
CMS Shell > cmsDriver.py TTbar_13TeV_TuneCUETP8M1_cfi -n 10 --no_exec --eventcontent FEVTDEBUG -s GEN,SIM --datatier GEN-SIM --fileout file:step1.root --conditions 102X_upgrade2018_design_v9
GEN,SIM,ENDJOB
We have determined that this is simulation (if not, rerun cmsDriver.py with --data)

Step: GEN Spec: 
Loading generator fragment from Configuration.Generator.TTbar_13TeV_TuneCUETP8M1_cfi
Step: SIM Spec: 
Step: ENDJOB Spec: 
Config file TTbar_13TeV_TuneCUETP8M1_cfi_GEN_SIM.py created
CMS Shell > 

Edit TTbar_13TeV_TuneCUETP8M1_cfi_GEN_SIM.py:

process.GlobalTag = GlobalTag(process.GlobalTag, '102X_upgrade2018_design_v9.db', '')

process.GlobalTag.connect = cms.string('sqlite_file:/cvmfs/cms-opendata-conddb.cern.ch/102X_upgrade2018_design_v9.db') process.GlobalTag.globaltag = '102X_upgrade2018_design_v9'

CMS Shell > cmsRun TTbar_13TeV_TuneCUETP8M1_cfi_GEN_SIM.py 
----- Begin Fatal Exception 13-Dec-2018 20:04:58 CET-----------------------
An exception of category 'NoRecord' occurred while
   [0] Processing global begin Run run: 1
   [1] Calling method for module OscarMTProducer/'g4SimHits'
Exception Message:
No "IdealGeometryRecord" record found in the EventSetup.n
 Please add an ESSource or ESProducer that delivers such a record.
----- End Fatal Exception -------------------------------------------------

@ggovi Can the problem be in slc6_amd64_gcc700? Or the reading of GT is also somewhat different as it does not have the top level directory anymore (as it used to be for the other GTs there). Is setting

process.GlobalTag.connect = cms.string('sqlite_file:/cvmfs/cms-opendata-conddb.cern.ch/102X_upgrade2018_design_v9.db')
process.GlobalTag.globaltag = '102X_upgrade2018_design_v9'

in the conf file OK?

katilp commented 5 years ago

And I can also confirm:

CMS Shell > conddb --db 102X_upgrade2018_design_v9.db list 102X_upgrade2018_design_v9 | grep IdealGeometryRecord
[2018-12-13 20:18:51,923] INFO: Connecting to 102X_upgrade2018_design_v9.db [sqlite:///102X_upgrade2018_design_v9.db]
IdealGeometryRecord                                     -                                                 TKRECO_Geometry_92YV3                                                    
CMS Shell > 
katilp commented 5 years ago

Adding:

process.GlobalTag.snapshotTime = cms.string("9999-12-31 23:59:59.000")

in the config file after the two other process.GlobalTag commands as in https://github.com/DeepDoubleB/DNNTuplesAK8/blob/opendata_80X/NtupleAK8/test/DeepNtuplizerAK8.py#L79-L81 (https://github.com/cernopendata/opendata.cern.ch/issues/2448#issuecomment-439981639) by @jmduarte seems to solve the problem (TBC: still running.. EDIT: confirmed, runs to the end without errors.)

AdrianoDee commented 5 years ago

Perfect, thanks! I'm checking how's working

ggovi commented 5 years ago

I've just realised that this GT has a built-in snapshot time. It is set to 2018-10-01 20:09:06, as you can see in: https://cms-conddb-prod.cern.ch/cmsDbBrowser/search/Prod/102X_upgrade2018_design_v9 The file on CVMFS has been produced on October the 30, so it make sense that without overriding the snapshot time to it does not work. In the future, we could update the built-in snapshot time accordingly.

AdrianoDee commented 5 years ago

Hi, the fix with

process.GlobalTag.snapshotTime = cms.string("9999-12-31 23:59:59.000")

is working for the RelVal standard sequences. Now I tried to run the ML production on the GEN-SIM-RAW dataset and I catch an error with the "l1conddb". In order, if I try to run the step2 on the data with a standard sequence such as

step2 --conditions auto:phase1_2018_design --eventcontent FEVTDEBUGHLT -s DIGI:pdigi_valid,L1,DIGI2RAW,HLT:@relval2018 --datatier GEN-SIM-DIGI-RAW --geometry DB:Extended

I get an error such as

----- Begin Fatal Exception 17-Dec-2018 13:05:11 CET-----------------------
An exception of category 'Invalid DetId' occurred while
   [0] Processing  Event run: 1 lumi: 28 event: 55787 stream: 0
   [1] Running path 'digitisation_step'
   [2] Calling method for module MixingModule/'mix'
Exception Message:
Cannot initialize HcalDetId from 10000c39
----- End Fatal Exception -------------------------------------------------
17-Dec-2018 13:05:11 CET  Closed file root://eospublic.cern.ch//eos/opendata/cms/MonteCarloUpgrade/RunIIAutumn18DR/TTToHadronic_TuneCP5_13TeV-powheg-pythia8/GEN-SIM-RAW/PUAvg50IdealConditions_102X_upgrade2018_design_v9-v1/110000/09D95BE7-23C8-5542-9FA4-102E3FEDB7A2.root

(Note that I get the same error on lxplus). After some tests it's easy to find out that this error comes form the fact that I'm missing the era customisation (and then detector customisation).

--era Run2_2018

And adding it I have no issue on lxplus. On the contrary I get a new "DB" (l1 db) error on the VM

----- Begin Fatal Exception 17-Dec-2018 13:09:24 CET-----------------------
An exception of category 'Incomplete configuration' occurred while
   [0] Constructing the EventProcessor
   [1] Constructing ESSource: class=PoolDBESSource label='l1conddb'
Exception Message:
Valid site-local-config not found at /cvmfs/cms.cern.ch/SITECONF/local/JobConfig/site-local-config.xml
----- End Fatal Exception -------------------------------------------------

I'm trying to track down where it pops up. I don't know if you have any suggestion.

katilp commented 5 years ago

Thanks: @ggovi Does this say something to you? Do we now have an additional db (in addition to the normal condition data) for l1?

katilp commented 5 years ago

@AdrianoDee Can this be related to something similar to: https://cmssdt.cern.ch/lxr/source/L1Trigger/L1TGlobal/test/runGTSummary.py?v=CMSSW_10_1_X_2018-02-23-2300 where there a mention that this (l1conddb) is needed before the prescales go to GlobalTag

AdrianoDee commented 5 years ago

@katilp Yes, is seems the case.

katilp commented 5 years ago

@AdrianoDee do you know if there's a newer GT with the required data already in?

ggovi commented 5 years ago

The line "Valid site-local-config not found at /cvmfs/cms.cern.ch/SITECONF/local/JobConfig/site-local-config.xml" should tell something. It looks that some action in the workflow is still relying on frontier?

AdrianoDee commented 5 years ago

I think so, but I don’t know If there is a way to to avoid it. The Run2_2018 customisation seems to be the “problem” but it seems also to be necessary since the data (my deduction) have been generated with that era. I’m investigating.

Il giorno 17 dic 2018, alle ore 21:04, ggovi notifications@github.com ha scritto:

The line "Valid site-local-config not found at /cvmfs/cms.cern.ch/SITECONF/local/JobConfig/site-local-config.xml" should tell something. It looks that some action in the workflow is still relying on frontier?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cernopendata/opendata.cern.ch/issues/2459#issuecomment-447979960, or mute the thread https://github.com/notifications/unsubscribe-auth/AQHkGl14w7JMgXAn2ats2Eex2LwGCUivks5u5_jAgaJpZM4Ys9Nn.

AdrianoDee commented 5 years ago

Step 2

The issue with l1conddb mentioned before may be solved in two steps.

1) Force the relative module ( l1conddb) to use the 102X_upgrade2018_design_v9 DB

myDb = cms.string('sqlite_file:/cvmfs/cms-opendata conddb.cern.ch/102X_upgrade2018_design_v9.db')
process.l1conddb.connect = myDb

2) This fix brings up a new error due to the fact that the naming of some tags in the the 102X_upgrade2018_design_v9 DB are different.

----- Begin Fatal Exception 29-Jan-2019 10:32:18 CET-----------------------
An exception of category 'ConditionDatabase' occurred while
   [0] Constructing the EventProcessor
   [1] Constructing ESSource: class=PoolDBESSource label='l1conddb'
Exception Message:
Tag "L1TCaloParams_static_CMSSW_9_2_10_2017_v1_8_2_updateHFSF_v6MET" has not been found in the database. from IOVProxy::load 
----- End Fatal Exception -------------------------------------------------

This can be easily fixed by fixing the tag name in the configuration file taking it from the conddb (here)

process.l1conddb.connect = myDb
process.l1conddb.toGet   = cms.VPSet(
            cms.PSet(
                 record = cms.string('L1TCaloParamsO2ORcd'),
                 tag = cms.string("L1TCaloParams_Stage2v3_2018_mc")
            )
       )

The same issue with l1conddb appears with another module l1ugmtdb and may be solved in the same way with the help of CondDB (here).

process.l1ugmtdb.connect = myDb
process.l1ugmtdb.toGet   = cms.VPSet(
            cms.PSet(
                 record = cms.string('L1TMuonGlobalParamsO2ORcd'),
                 tag = cms.string("L1TMuonGlobalParams_Stage2v0_2018_mc")
            )
       )

With this fixes the step2 runs smootly and thus the first part of the production for "Online" pixeles production.

Step 3

The step3 config file is customized as the step2 since it shows the same issues but hit a new similar error.

----- Begin Fatal Exception 29-Jan-2019 22:46:17 CET-----------------------
An exception of category 'Incomplete configuration' occurred while
   [0] Constructing the EventProcessor
   [1] Constructing ESSource: class=PoolDBESSource label='loadRecoTauTagMVAsFromPrepDB'
Exception Message:
Valid site-local-config not found at /cvmfs/cms.cern.ch/SITECONF/local/JobConfig/site-local-config.xml
----- End Fatal Exception -------------------------------------------------

Thus following what done before the DB is forced via sqlite

process.GlobalTag.connect = myDb

process.GlobalTag.globaltag = '102X_upgrade2018_design_v9'
process.GlobalTag.snapshotTime = cms.string("9999-12-31 23:59:59.000")

process.l1conddb.connect = myDb
process.l1ugmtdb.connect = myDb
process.loadRecoTauTagMVAsFromPrepDB.connect = myDb

As above this bring a new error related to the Tag name

----- Begin Fatal Exception 29-Jan-2019 22:50:47 CET-----------------------
An exception of category 'ConditionDatabase' occurred while
   [0] Constructing the EventProcessor
   [1] Constructing ESSource: class=PoolDBESSource label='loadRecoTauTagMVAsFromPrepDB'
Exception Message:
Tag "RecoTauTag_againstMuonMVAv1" has not been found in the database. from IOVProxy::load 
----- End Fatal Exception -------------------------------------------------

Now, differently from above, this is not avoidable simply by fixing the name because in the 102X_upgrade2018_design_v9 DB neither the tag name (RecoTauTag_againstMuonMVAv1) neither the relative record (see GBRWrapperRcd) exist. Indeed, inspetting the CondDB (here), the object seems to be copied from an external DB that is not accessible from the OpenData VM

image

I tried to figure if there's any way to avoid this issue but it seems this DB is necessary.

katilp commented 5 years ago

OK, thanks! How big is this external db and can we have it somewhere accessible? Is the location hard-coded somewhere deep in CMSSW?

Ideally, it would be best to make a new GT including it, if possible. Independent from the Open Data VM and its access restrictions, it is not a very solid practice to have it from an external db, this will need to be sorted out sooner or later, in any case-

AdrianoDee commented 5 years ago

Sincerly I have no clear idea. This record is called here

https://cmssdt.cern.ch/dxr/CMSSW/source/RecoTauTag/Configuration/python/loadRecoTauTagMVAsFromPrepDB_cfi.py

and the db it refers to comes from the frontier

CondDBTauConnection = CondDB.clone( connect = cms.string( 'frontier://FrontierProd/CMS_CONDITIONS' ) )

The idea it may come from an external db is suggested only by the result in CondDB, but I'm not even remotely an expert about these stuff. In next days I'll to figure out something also asking to somebody more expert than me.

katilp commented 5 years ago

@ggovi @jmduarte Can you help here? An external db which we just copy over if small (not ideal) or get it included in a new GT. As this is part of the normal reco step, the latter would be better. Thanks!

jmduarte commented 5 years ago

That tag is available in the Production (and Preparation) database, it's just not part of any GlobalTag apparently.

It's weird to me that such a tag would be needed in a real RECO job, but I don't know much about that tag.

To just export this, I just did this (following the tutorial https://indico.cern.ch/event/507993/contributions/2020446/attachments/1252208/1847694/talk4_-_hands_on_tutorial.pdf):

cmsrel CMSSW_10_2_5
cd CMSSW_10_2_5/src
cmsenv
# check that's it there
conddb list RecoTauTag_againstMuonMVAv1
# export it to a local sqlite file with the same tag name
conddb_import -c sqlite_file:recotautag.db -f frontier://PromptProd/CMS_CONDITIONS -i RecoTauTag_againstMuonMVAv1 -t RecoTauTag_againstMuonMVAv1 
# check it again in the new file
conddb --db  recotautag.db list RecoTauTag_againstMuonMVAv1

I put the file here: /afs/cern.ch/user/w/woodson/public/recotautag.db You can copy this file to your working area and use it to read in the tag you need (like you do with the other sqlite file you use already).

Making a new GlobalTag with this tag included (as well as all the other changes you had to do by hand) seems like the best solution, but probably needs @ggovi's or another AlCaDB person's help.

Thanks, Javier

katilp commented 5 years ago

Thanks @jmduarte !

ggovi commented 5 years ago

@jmduarte it looks correct except that it would be preferable to use conddb copy RecoTauTag_againstMuonMVAv1 --destdb recotautag.db instead of conddb_import... acutally in order to have a single file it could be even copied in the sqlite file containing the GT

AdrianoDee commented 5 years ago

Eureka!

Following the instructions from @jmduarte and @ggovi (thanks) I created a separate db with all the RecoTauTag tags and everything the production whole wf is working smoothly now. I put the db here

/afs/cern.ch/user/a/adiflori/public/recotautag.db

It's about 4MB

katilp commented 5 years ago

Closing as remaining issues followed in #2576 (FIXME for usage example in ML sample record) and #2589 (attach zipped github repository)