Open katilp opened 8 years ago
Indeed, the web site is not accessible to me. @RaoOfPhysics can you please have a look?
@tiborsimko we probably need to extract these numbers from McM with a script. Maybe when I get an example script to get them for one dataset, you can help me with a script to get them for all list?
Update after a discussio with Luca Perrozzi: For 2011, 2012 these numbers are not necessarily in mcm However, they can be found in "PREP" i.e. for 2011 (CMS internal: gives a list of all datasets in Summer11 production campaign with the necessary numbers:
I can have this list in html. @tiborsimko would you be able to extract the three numbers for each dataset record and insert them to the records?
(Note: The CMSSW version in the listing is different as that of AODSIM, as this refers to the generator files which may have been with a different version (and in any case has no influence to the cross-section value))
For 2015 on, the values can be extracted from McM with a script
Note that only background MC samples have a cross-section value, for signal MC it is set to one.
Furthermore, alternatively, the user can run a script in the VM over the CM sample which computes the cross section from the file. However, the documentation say that this only exists from CMSSW_5_3_21 on and the MC samples 2011 have been produced earlier.
In addition, more precise values for most important background samples are collected in pages (for 7 TeV 2011) and (for 8 TeV 2011) and these pages can go public.
The file containing the cross-section values and filter efficiencies (in html) is now in (link update 16/04/2018)
As it has the full Summer11 MC production campaign, it has much more entries (3503) than we have on the portal and therefore it contains entries which we do not have.
However, for the public 2011 MC, I hope that there is 1-to-1 correspondence with the dataset names and the entries in this table.
The numbers of interest are two first of the three values before each dataset name starting from the listing part i.e.
<td style="font-size: 10px;">2.98E7</td>
<td style="font-size: 10px;">3.188E-4</td>
<td style="white-space: nowrap;">-1</td>
<td style="white-space: nowrap;">BpToPsiMuMu_2MuPEtaFilter_Tight_7TeV-pythia6-evtgen</td>
It is quite large (80MB) and if needed we can have a look together to have it in some more reasonable format. @tiborsimko let me know...
@tiborsimko: Sorry I missed this:
Indeed, the web site is not accessible to me. @RaoOfPhysics can you please have a look?
It's indeed internal. Apologies about the delay in confirming.
For the 2012 release, a similar listing (html extract) resulting from
is in (link updated 16/04/2018)
A note for the numbers to be extracted:
are to be multiplied to obtain the correct effective cross section. In PREP (i.e. for samples of 2011 and 2012) the "filter efficiency" is the product of filter efficiency and match efficiency (which appear as separate entries in McM i.e. for samples of 2015 and beyond). I suggest that we should add these three fields for MC records already now (with match efficiency = 1 for the samples from 2011 and 2012). Or display the value as "Filter efficiency * Match efficiency".
For the 2011 MC records, we could just add a new field (for example, 944 is not specified in the MARC21 documentation, so it should be empty), which could be called "effective cross section" and have 3 subfields:
For 2012, we will have the new schemas, so we can adjust it or change it completely.
I'm not sure how we usually extract information to insert it to the records. @tiborsimko , how do you think we could go about doing this?
The pdf file of the 2011 listing (see above for html PREP - Request Management 2011 xc.pdf
Refarding efficiencies, there are also corresponding errors, for example:
Dataset: /ttbarZ_8TeV-Madspin_aMCatNLO-herwig/Summer12_DR53X-PU_S10_START53_V19-v1/AODSIM
Parent dataset: /ttbarZ_8TeV-Madspin_aMCatNLO-herwig/Summer12-START53_V7C-v1/GEN-SIM
Generator parameters:
Cross section: 0.1746
Filter efficiency: 1
Filter efficiency error: 0
Match efficiency: 1
Match efficiency error: -1
Do we want to store those separately?
Here are various values for 2012 MC datasets:
Match efficiency error: 0.000141
Match efficiency error: 0.000173
Match efficiency error: 0.0001
Match efficiency error: 0.001755
Match efficiency error: 0.001
Match efficiency error: 0.0025
Match efficiency error: 0.002
Match efficiency error: 0.005
Match efficiency error: 0.015
Match efficiency error: 0.01
Match efficiency error: 0.02
Match efficiency error: 0.03
Match efficiency error: 0.05
Match efficiency error: 0.1
Match efficiency error: 0
Match efficiency error: -1
Match efficiency error: 1
Match efficiency error: 3.3e-05
Match efficiency error: 3.6e-05
Filter efficiency error: 0.00017
Filter efficiency error: 0.00026
Filter efficiency error: 0.00034
Filter efficiency error: 0.00048
Filter efficiency error: 0.00051
Filter efficiency error: 0.0005
Filter efficiency error: 0.00063
Filter efficiency error: 0.0012
Filter efficiency error: 0.001
Filter efficiency error: 0.003
Filter efficiency error: 0.01
Filter efficiency error: 0.02
Filter efficiency error: 0
Filter efficiency error: 1.41e-05
Filter efficiency error: -1
Filter efficiency error: 1
Filter efficiency error: 2.2e-05
Filter efficiency error: 2e-05
Filter efficiency error: 3.4e-05
Filter efficiency error: 3.8e-05
Filter efficiency error: 3.9e-05
Filter efficiency error: 3e-05
Filter efficiency error: 4.24e-05
Filter efficiency error: 4.4e-05
Filter efficiency error: 4.69e-05
Filter efficiency error: 4.9e-05
Filter efficiency error: 4e-06
Filter efficiency error: 5e-05
Filter efficiency error: 6.48e-05
Filter efficiency error: 8e-06
The cross-sections are now avalaible from CMSDAS where they can be extracted in more straigth forward way. In any case (from Luca Perrozzi)
these values are usually computed with the generator used to produce events and not the latest and greatest calculation available. In general, these values are good starting point for the analysts, so I suggest to use them especially if they can be retrieved with a script. However, they should be updated whenever possible with the tables provided in twikis like
We have an (oral) agreement from CMS MC group for providing these numbers in public.
Note information about matching and filter efficiencies in
From To be judged whether relevant for us.
Note the recipe in
The Cross Section DB Portal:
and the twiki:
See also RFC about storing cross section information in the data model fields.
Closing as followed up now in #2476
Reopening as the nice recipe in #2476 will only work for dataset produced with CMSSW higher than 5_3_31
The cross-section values are now extracted by the provenance script and available in cache, but not yet displayed.
Closing as now superseded by #3454
Need the following additional information (important for research use) for MC datasets:
The recipe to extract this from (CMS internal?)
The way to access to the cross section:
If "Generator parameters" is not displayed, you can select it in "Select View" and "Save selection".