cms-analysis / flashgg

20 stars 158 forks source link

Inconsistency in 2016 Data Events #1259

Open atishelmanch opened 3 years ago

atishelmanch commented 3 years ago

Dear All, @simonepigazzini @youyingli @edjtscott @rchatter @alesauva @panwarlsweet ,

I have noticed a difference in the number of events in the campaign: Era2016_RR-17Jul2018_v2 between the current version of the data jsons and a version from September. Comparing the two fggManageSamples outputs:

=====================================================================

Current Version

fggManageSamples.py -C Era2016_RR-17Jul2018_v2 list '*DoubleEG*'

Datasets in catalog:

Name N events N parent N good N bad Avg or lumis files files weight

/DoubleEG/..Run2016B.. 47420 59819 2371 0 4.3e+02 /DoubleEG/..Run2016C.. 17378 -1 869 0 6.2e+02 /DoubleEG/..Run2016D.. 29153 30238 1458 0 6e+02 /DoubleEG/..Run2016E.. 18760 27153 938 0 6.7e+02 /DoubleEG/..Run2016F.. 16359 -1 818 0 6.9e+02 /DoubleEG/..Run2016G.. 9020 -1 451 0 6.8e+02 /DoubleEG/..Run2016H.. 47441 52562 2373 0 7.1e+02

                                    total      185531    9278

=====================================================================

Older September 2020 Version:

fggManageSamples.py -C Era2016_RR-17Jul2018_v2 list '*DoubleEG*'

Datasets in catalog:

Name N events N parent N good N bad Avg or lumis files files weight

/DoubleEG/..Run2016B.. 52718 59819 2636 0 4.4e+02 /DoubleEG/..Run2016C.. 17378 18769 869 0 6.2e+02 /DoubleEG/..Run2016D.. 29153 30238 1458 0 6e+02 /DoubleEG/..Run2016E.. 24773 27153 1239 0 6.7e+02 /DoubleEG/..Run2016F.. 18079 19441 904 0 6.9e+02 /DoubleEG/..Run2016G.. 42661 46318 2134 1 6.8e+02 /DoubleEG/..Run2016H.. 47441 52562 2373 0 7.1e+02

                                    total      232203   11613

=====================================================================

It appears that between these two versions, good files and events are lost in the DoubleEG microAODs. Is this expected? Is there an issue here?

Thanks, Abe

alesauva commented 3 years ago

Hello @atishelmanch , this is indeed not normal. I highly suspect that we merged two different versions of the catalogue with different ordering. This is something that we tried to avoid of course but I think I was not careful enough. Anyways I will track down the issue and fix it asap. Thanks for the info and sorry for the troubles

simonepigazzini commented 3 years ago

Hi guys, I think the issue happened after @alesauva added some new samples to the catalogue. My original version contained merged dataset to avoid having several microAOD dataset (cause by job failing, etc) per miniAOD. I think the best is to go back to my original version and import anew the additional samples that were introduced later.

simone

alesauva commented 3 years ago

The inconsistency was incidentally introduced in PR #1235 . I am still a bit puzzled about why this was not fixed by latest PRs with addition of samples forked from my personal branch [1] in which everything was fine... but anyways you can find the fix to this issue in the PR #1260 I've just submited.

Cheers

[1] https://github.com/alesauva/flashgg/tree/dev_legacy_runII/MetaData/data/Era2016_RR-17Jul2018_v2

alesauva commented 3 years ago

Hi @simonepigazzini , doing some additional cross-checks in the 2016 catalogue I found out that the SingleElectron Run2016H sample was not all present in the main flashgg branch 2016 RR catalogue (or at least it was not in your latest production PRs). I think you did produce the sample as I was able to import it sucessfully;

/SingleElectron/spigazzi-Era2016_RR-17Jul2018_v2-legacyRun2FullV1-v0-Run2016H-17Jul2018-v1-db30e4011d9f1e7e37aee2e41519d339/USER

/SingleElectron/..Run2016H.. 47296 52562 2365 0 3.3e+02

If you could confirm that there is no particular issue with this sample (or any reason for it not being there), I will add it.

simonepigazzini commented 3 years ago

Hi, I think RunH SingleEle should be there and used to be at the beginning.

alesauva commented 3 years ago

OK, added. Thanks.