Closed katilp closed 5 years ago
The files are now available on eospublic, and can be read from there through xrootd, see the file names and paths in:
VM: CMS OpenData 1.4.0 from here CMSSW: 10_2_5 (production release for the dataset) and 10_3_5 GT: 102X_upgrade2018_design_v9 SCRAM_ARCH: slc7_am64_gcc700
NB: the VM seems to be listed as slc6 but when checking the OS I got
Also cmsrel seems to be aware of that. The symlink has been set (in the directory I'm running)
ln -sf /cvmfs/cms-opendata-conddb.cern.ch/102X_upgrade2018_design_v9.db 102X_upgrade2018_design_v9.db
1) Testing Generic RelVal Production
Since the ML dataset production is run within the RECO step (step3), as first test I tried to run a generic RelVal workflow from the GEN step generating a config with cmsDriver
cmsDriver.py TTbar_13TeV_TuneCUETP8M1_cfi -n 10 --no_exec --eventcontent FEVTDEBUG -s GEN,SIM --datatier GEN-SIM --fileout file:step1.root --conditions 102X_upgrade2018_design_v9
In the resulting config [*] I've changed the GT following the guide
#process.GlobalTag = GlobalTag(process.GlobalTag, '102X_upgrade2018_design_v9.db', '')
process.GlobalTag.connect = cms.string('sqlite_file:/cvmfs/cms-opendata-conddb.cern.ch/102X_upgrade2018_design_v9.db')
process.GlobalTag.globaltag = '102X_upgrade2018_design_v9'
(it seems to have effect since before fixing I had a DB error)
The error I get:
Full stack trace attached[*] . The module that has crashed is
Module: Pythia8GeneratorFilter:generator (crashed)
2) Testing RECO step on GEN-SIM-RAW files from the dataset
Basically I tried to run step3 config on one of the GEN-SIM-RAW dataset
_/TTToHadronic_TuneCP5_13TeV-powheg-pythia8/RunIIAutumn18DR-PUAvg50IdealConditions_102X_upgrade2018_designv9-v1/GEN-SIM-RAW
accessed by eospublic
eosfile = 'root://eospublic.cern.ch//eos/opendata/cms/MonteCarloUpgrade/RunIIAutumn18DR/TTToHadronic_TuneCP5_13TeV-powheg-pythia8/GEN-SIM-RAW/PUAvg50IdealConditions_102X_upgrade2018_design_v9-v1/110000/09D95BE7-23C8-5542-9FA4-102E3FEDB7A2.root'
again the config is one of the standard RelVal one generated with cmsDriver [*]
cmsDrivery.py step3 --conditions auto:phase1_2018_realistic -n -1 --eventcontent RECOSIM,MINIAODSIM,DQM --runUnscheduled -s RAW2DIGI,L1Reco,RECO,RECOSIM,EI,PAT,VALIDATION --datatier GEN-SIM-RECO,MINIAODSIM,DQMIO --filein file:step2.root --fileout file:step3.root
where the GT has been fixed as above and the input file too.
I haven't tested the ML production config yet since it's based on the step3 config.
Thanks @AdrianoDee : note that the slc6 shell is available from the desktop icon (not from the icon at the bottom of the screen), and it is a singularity container with slc6. The native shell is indeed slc7. This is on purpose (although we have to document it better!!) and all CMSSW specific command should be done on the slc6 shell. Sorry for the confusion.
Thanks @katilp, I'll check with slc6 and let you know soon
I just tested withe the slc6 within the CMS Shell (with SCRAM_ARC=slc6_am64_gcc700 ad CMSSW_10_2_5) and basically I'm getting the same errors.
OK, thanks, then it could be an error in the GT export. I understand that this runs with the same CMSSW and GT on lxplus. @ggovi could you please have a look (the error log in the cernbox link above)? Thanks!
@katilp my check: cd /cvmfs/cms-opendata-conddb.cern.ch conddb --db 102X_upgrade2018_design_v9.db list 102X_upgrade2018_design_v9 shows that the IdealGeometryRecord is there, mapped with tag TKRECO_Geometry_92YV3 conddb --db 102X_upgrade2018_design_v9.db list TKRECO_Geometry_92YV3 shows the same iov content than the tag in production dB... Looks everything in place - can't help much unfortunately...
[edited the above text to remove the bad result from a cut and paste...]
For completeness I just tested point 1) on lxplus and it's all smooth. For GT:
process.genstepfilter.triggerConditions=cms.vstring("generation_step")
from Configuration.AlCa.GlobalTag import GlobalTag
process.GlobalTag = GlobalTag(process.GlobalTag, '102X_upgrade2018_design_v9', '')
The result:
OK, thanks. Edit, indeed, I confirm, for the standard GEN process, I get as well:
CMS Shell > cmsrel CMSSW_10_2_5
WARNING: Release CMSSW_10_2_5 is not available for architecture slc6_amd64_gcc472.
Developer's area is created for available architecture slc6_amd64_gcc700.
CMS Shell > cd CMSSW_10_2_5/src
CMS Shell > cmsenv
CMS Shell > ln -sf /cvmfs/cms-opendata-conddb.cern.ch/102X_upgrade2018_design_v9.db 102X_upgrade2018_design_v9.db
CMS Shell > cmsDriver.py TTbar_13TeV_TuneCUETP8M1_cfi -n 10 --no_exec --eventcontent FEVTDEBUG -s GEN,SIM --datatier GEN-SIM --fileout file:step1.root --conditions 102X_upgrade2018_design_v9
GEN,SIM,ENDJOB
We have determined that this is simulation (if not, rerun cmsDriver.py with --data)
Step: GEN Spec:
Loading generator fragment from Configuration.Generator.TTbar_13TeV_TuneCUETP8M1_cfi
Step: SIM Spec:
Step: ENDJOB Spec:
Config file TTbar_13TeV_TuneCUETP8M1_cfi_GEN_SIM.py created
CMS Shell >
Edit TTbar_13TeV_TuneCUETP8M1_cfi_GEN_SIM.py:
process.GlobalTag = GlobalTag(process.GlobalTag, '102X_upgrade2018_design_v9.db', '')
process.GlobalTag.connect = cms.string('sqlite_file:/cvmfs/cms-opendata-conddb.cern.ch/102X_upgrade2018_design_v9.db') process.GlobalTag.globaltag = '102X_upgrade2018_design_v9'
CMS Shell > cmsRun TTbar_13TeV_TuneCUETP8M1_cfi_GEN_SIM.py
----- Begin Fatal Exception 13-Dec-2018 20:04:58 CET-----------------------
An exception of category 'NoRecord' occurred while
[0] Processing global begin Run run: 1
[1] Calling method for module OscarMTProducer/'g4SimHits'
Exception Message:
No "IdealGeometryRecord" record found in the EventSetup.n
Please add an ESSource or ESProducer that delivers such a record.
----- End Fatal Exception -------------------------------------------------
@ggovi Can the problem be in slc6_amd64_gcc700? Or the reading of GT is also somewhat different as it does not have the top level directory anymore (as it used to be for the other GTs there). Is setting
process.GlobalTag.connect = cms.string('sqlite_file:/cvmfs/cms-opendata-conddb.cern.ch/102X_upgrade2018_design_v9.db')
process.GlobalTag.globaltag = '102X_upgrade2018_design_v9'
in the conf file OK?
And I can also confirm:
CMS Shell > conddb --db 102X_upgrade2018_design_v9.db list 102X_upgrade2018_design_v9 | grep IdealGeometryRecord
[2018-12-13 20:18:51,923] INFO: Connecting to 102X_upgrade2018_design_v9.db [sqlite:///102X_upgrade2018_design_v9.db]
IdealGeometryRecord - TKRECO_Geometry_92YV3
CMS Shell >
Adding:
process.GlobalTag.snapshotTime = cms.string("9999-12-31 23:59:59.000")
in the config file after the two other process.GlobalTag commands as in https://github.com/DeepDoubleB/DNNTuplesAK8/blob/opendata_80X/NtupleAK8/test/DeepNtuplizerAK8.py#L79-L81 (https://github.com/cernopendata/opendata.cern.ch/issues/2448#issuecomment-439981639) by @jmduarte seems to solve the problem (TBC: still running.. EDIT: confirmed, runs to the end without errors.)
Perfect, thanks! I'm checking how's working
I've just realised that this GT has a built-in snapshot time. It is set to 2018-10-01 20:09:06, as you can see in: https://cms-conddb-prod.cern.ch/cmsDbBrowser/search/Prod/102X_upgrade2018_design_v9 The file on CVMFS has been produced on October the 30, so it make sense that without overriding the snapshot time to it does not work. In the future, we could update the built-in snapshot time accordingly.
Hi, the fix with
process.GlobalTag.snapshotTime = cms.string("9999-12-31 23:59:59.000")
is working for the RelVal standard sequences. Now I tried to run the ML production on the GEN-SIM-RAW dataset and I catch an error with the "l1conddb". In order, if I try to run the step2 on the data with a standard sequence such as
step2 --conditions auto:phase1_2018_design --eventcontent FEVTDEBUGHLT -s
DIGI:pdigi_valid,L1,DIGI2RAW,HLT:@relval2018 --datatier GEN-SIM-DIGI-RAW --geometry
DB:Extended
I get an error such as
----- Begin Fatal Exception 17-Dec-2018 13:05:11 CET-----------------------
An exception of category 'Invalid DetId' occurred while
[0] Processing Event run: 1 lumi: 28 event: 55787 stream: 0
[1] Running path 'digitisation_step'
[2] Calling method for module MixingModule/'mix'
Exception Message:
Cannot initialize HcalDetId from 10000c39
----- End Fatal Exception -------------------------------------------------
17-Dec-2018 13:05:11 CET Closed file root://eospublic.cern.ch//eos/opendata/cms/MonteCarloUpgrade/RunIIAutumn18DR/TTToHadronic_TuneCP5_13TeV-powheg-pythia8/GEN-SIM-RAW/PUAvg50IdealConditions_102X_upgrade2018_design_v9-v1/110000/09D95BE7-23C8-5542-9FA4-102E3FEDB7A2.root
(Note that I get the same error on lxplus). After some tests it's easy to find out that this error comes form the fact that I'm missing the era customisation (and then detector customisation).
--era Run2_2018
And adding it I have no issue on lxplus. On the contrary I get a new "DB" (l1 db) error on the VM
----- Begin Fatal Exception 17-Dec-2018 13:09:24 CET-----------------------
An exception of category 'Incomplete configuration' occurred while
[0] Constructing the EventProcessor
[1] Constructing ESSource: class=PoolDBESSource label='l1conddb'
Exception Message:
Valid site-local-config not found at /cvmfs/cms.cern.ch/SITECONF/local/JobConfig/site-local-config.xml
----- End Fatal Exception -------------------------------------------------
I'm trying to track down where it pops up. I don't know if you have any suggestion.
Thanks: @ggovi Does this say something to you? Do we now have an additional db (in addition to the normal condition data) for l1?
@AdrianoDee Can this be related to something similar to: https://cmssdt.cern.ch/lxr/source/L1Trigger/L1TGlobal/test/runGTSummary.py?v=CMSSW_10_1_X_2018-02-23-2300 where there a mention that this (l1conddb) is needed before the prescales go to GlobalTag
@katilp Yes, is seems the case.
@AdrianoDee do you know if there's a newer GT with the required data already in?
The line "Valid site-local-config not found at /cvmfs/cms.cern.ch/SITECONF/local/JobConfig/site-local-config.xml" should tell something. It looks that some action in the workflow is still relying on frontier?
I think so, but I don’t know If there is a way to to avoid it. The Run2_2018 customisation seems to be the “problem” but it seems also to be necessary since the data (my deduction) have been generated with that era. I’m investigating.
Il giorno 17 dic 2018, alle ore 21:04, ggovi notifications@github.com ha scritto:
The line "Valid site-local-config not found at /cvmfs/cms.cern.ch/SITECONF/local/JobConfig/site-local-config.xml" should tell something. It looks that some action in the workflow is still relying on frontier?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cernopendata/opendata.cern.ch/issues/2459#issuecomment-447979960, or mute the thread https://github.com/notifications/unsubscribe-auth/AQHkGl14w7JMgXAn2ats2Eex2LwGCUivks5u5_jAgaJpZM4Ys9Nn.
Step 2
The issue with l1conddb mentioned before may be solved in two steps.
1) Force the relative module ( l1conddb) to use the 102X_upgrade2018_design_v9 DB
myDb = cms.string('sqlite_file:/cvmfs/cms-opendata conddb.cern.ch/102X_upgrade2018_design_v9.db')
process.l1conddb.connect = myDb
2) This fix brings up a new error due to the fact that the naming of some tags in the the 102X_upgrade2018_design_v9 DB are different.
----- Begin Fatal Exception 29-Jan-2019 10:32:18 CET-----------------------
An exception of category 'ConditionDatabase' occurred while
[0] Constructing the EventProcessor
[1] Constructing ESSource: class=PoolDBESSource label='l1conddb'
Exception Message:
Tag "L1TCaloParams_static_CMSSW_9_2_10_2017_v1_8_2_updateHFSF_v6MET" has not been found in the database. from IOVProxy::load
----- End Fatal Exception -------------------------------------------------
This can be easily fixed by fixing the tag name in the configuration file taking it from the conddb (here)
process.l1conddb.connect = myDb
process.l1conddb.toGet = cms.VPSet(
cms.PSet(
record = cms.string('L1TCaloParamsO2ORcd'),
tag = cms.string("L1TCaloParams_Stage2v3_2018_mc")
)
)
The same issue with l1conddb appears with another module l1ugmtdb and may be solved in the same way with the help of CondDB (here).
process.l1ugmtdb.connect = myDb
process.l1ugmtdb.toGet = cms.VPSet(
cms.PSet(
record = cms.string('L1TMuonGlobalParamsO2ORcd'),
tag = cms.string("L1TMuonGlobalParams_Stage2v0_2018_mc")
)
)
With this fixes the step2 runs smootly and thus the first part of the production for "Online" pixeles production.
Step 3
The step3 config file is customized as the step2 since it shows the same issues but hit a new similar error.
----- Begin Fatal Exception 29-Jan-2019 22:46:17 CET-----------------------
An exception of category 'Incomplete configuration' occurred while
[0] Constructing the EventProcessor
[1] Constructing ESSource: class=PoolDBESSource label='loadRecoTauTagMVAsFromPrepDB'
Exception Message:
Valid site-local-config not found at /cvmfs/cms.cern.ch/SITECONF/local/JobConfig/site-local-config.xml
----- End Fatal Exception -------------------------------------------------
Thus following what done before the DB is forced via sqlite
process.GlobalTag.connect = myDb
process.GlobalTag.globaltag = '102X_upgrade2018_design_v9'
process.GlobalTag.snapshotTime = cms.string("9999-12-31 23:59:59.000")
process.l1conddb.connect = myDb
process.l1ugmtdb.connect = myDb
process.loadRecoTauTagMVAsFromPrepDB.connect = myDb
As above this bring a new error related to the Tag name
----- Begin Fatal Exception 29-Jan-2019 22:50:47 CET-----------------------
An exception of category 'ConditionDatabase' occurred while
[0] Constructing the EventProcessor
[1] Constructing ESSource: class=PoolDBESSource label='loadRecoTauTagMVAsFromPrepDB'
Exception Message:
Tag "RecoTauTag_againstMuonMVAv1" has not been found in the database. from IOVProxy::load
----- End Fatal Exception -------------------------------------------------
Now, differently from above, this is not avoidable simply by fixing the name because in the 102X_upgrade2018_design_v9 DB neither the tag name (RecoTauTag_againstMuonMVAv1) neither the relative record (see GBRWrapperRcd) exist. Indeed, inspetting the CondDB (here), the object seems to be copied from an external DB that is not accessible from the OpenData VM
I tried to figure if there's any way to avoid this issue but it seems this DB is necessary.
OK, thanks! How big is this external db and can we have it somewhere accessible? Is the location hard-coded somewhere deep in CMSSW?
Ideally, it would be best to make a new GT including it, if possible. Independent from the Open Data VM and its access restrictions, it is not a very solid practice to have it from an external db, this will need to be sorted out sooner or later, in any case-
Sincerly I have no clear idea. This record is called here
and the db it refers to comes from the frontier
CondDBTauConnection = CondDB.clone( connect = cms.string( 'frontier://FrontierProd/CMS_CONDITIONS' ) )
The idea it may come from an external db is suggested only by the result in CondDB, but I'm not even remotely an expert about these stuff. In next days I'll to figure out something also asking to somebody more expert than me.
@ggovi @jmduarte Can you help here? An external db which we just copy over if small (not ideal) or get it included in a new GT. As this is part of the normal reco step, the latter would be better. Thanks!
That tag is available in the Production (and Preparation) database, it's just not part of any GlobalTag apparently.
It's weird to me that such a tag would be needed in a real RECO job, but I don't know much about that tag.
To just export this, I just did this (following the tutorial https://indico.cern.ch/event/507993/contributions/2020446/attachments/1252208/1847694/talk4_-_hands_on_tutorial.pdf):
cmsrel CMSSW_10_2_5
cd CMSSW_10_2_5/src
cmsenv
# check that's it there
conddb list RecoTauTag_againstMuonMVAv1
# export it to a local sqlite file with the same tag name
conddb_import -c sqlite_file:recotautag.db -f frontier://PromptProd/CMS_CONDITIONS -i RecoTauTag_againstMuonMVAv1 -t RecoTauTag_againstMuonMVAv1
# check it again in the new file
conddb --db recotautag.db list RecoTauTag_againstMuonMVAv1
I put the file here: /afs/cern.ch/user/w/woodson/public/recotautag.db
You can copy this file to your working area and use it to read in the tag you need (like you do with the other sqlite file you use already).
Making a new GlobalTag with this tag included (as well as all the other changes you had to do by hand) seems like the best solution, but probably needs @ggovi's or another AlCaDB person's help.
Thanks, Javier
Thanks @jmduarte !
@jmduarte it looks correct except that it would be preferable to use conddb copy RecoTauTag_againstMuonMVAv1 --destdb recotautag.db instead of conddb_import... acutally in order to have a single file it could be even copied in the sqlite file containing the GT
Eureka!
Following the instructions from @jmduarte and @ggovi (thanks) I created a separate db with all the RecoTauTag tags and everything the production whole wf is working smoothly now. I put the db here
/afs/cern.ch/user/a/adiflori/public/recotautag.db
It's about 4MB
Closing as remaining issues followed in #2576 (FIXME for usage example in ML sample record) and #2589 (attach zipped github repository)
In connection with #2440, some ML samples will be produced from newly produced Upgrade samples and made available on the portal.
The datasets are produced and available:
/TTToHadronic_TuneCP5_13TeV-powheg-pythia8/RunIIAutumn18DR-PUAvg50IdealConditions_102X_upgrade2018_design_v9-v1/GEN-SIM-RAW /TTToHadronic_TuneCP5_13TeV-powheg-pythia8/RunIIAutumn18DR-PUAvg50IdealConditions_102X_upgrade2018_design_v9-v1/GEN-SIM-RECO /TTToHadronic_TuneCP5_13TeV-powheg-pythia8/RunIIAutumn18DR-PUAvg50IdealConditions_102X_upgrade2018_design_v9-v1/AODSIM
To do:
arch
)For contributions, see also https://github.com/cernopendata/opendata.cern.ch/wiki/Contributing-content-to-CERN-Open-Data