Figure out double bump in number of primary vertex distribution in signal

caredg commented 2 years ago

As described here there is a new problem with the observation of a double bump in the number of primary vertex distribution in the signal. It seems like there are two minbias profiles embeded in the simulation. I will run some tests to figure out how to fix it.

caredg commented 2 years ago

First, let's look at one of the files that we got before "fixing" the spike in the electrons eta (#35). This one correspond to the ntuple from the signalStudy_round3 batch. The npv distribution look like: npv_pu_eletaproblem_signalStudy_round3

It is not very clear, but it seems that I see structure there as well.

caredg commented 2 years ago

Now, will try to generate a few thousand events (in separate jobs, as always) and look at the merged. First, I ran without pileup altogether. Just to check what is the npv distribution. As expected, there is no pile-up. So the PU mixing module is working (at least doing something) npv_nopu

caredg commented 2 years ago

I also switched off the file randomization lines I introduced earlier and made a test keeping just the first minbias file and the seed randomization: npv_pu_seedButNoRandFile The structure remains.

Then, I switched off the seed also (although this would be the same as what we had before, I just wanted to double check): npv_pu_NoseedNoRandFile

with similar results.

caredg commented 2 years ago

Then, I was checking the production of track-enriched QCD samples. If I look one of these file I see this distribution: npv_trackenrichedQCD It looks more complete, although one can still appreciate the holes. So, after looking at their configuration, it looks like giving the whole MinBias dataset does the trick. This seems to be the only difference with our original configuration (before trying to fix the electron eta issue). I will try to run like this, maybe I am misunderstanding how the mixer works.

caredg commented 2 years ago

I ran the simulation with no random seeding but with the whole dataset of MinBias files. It seems that the same file (although not the first one) is always opened. As expected structure is present in the npv plot: npv_noseedAllMBFiles_fullstats and the old electron eta problem is also present: elecetaspikes

Next I will run with less statistics (because I do not think that is the problem) but with the random seeding and the full minbias dataset.

caredg commented 2 years ago

Simulating with the random seed and all the MB files, I get similar results npv_seedAllMBFiles The hole is somewhere different. So I will test just randomizing the MB files but without seed.

caredg commented 2 years ago

@JonaJJSJ-crypto When I run without randomizing the seed but only randomizing the picking of the MB files, I obtain this distribution: npv_noseedRandFiles

So my interpretation now is that the random seeding that I was trying is introducing some bias. If I do not randomize the seed but neither I randomize the file, however, the same MB file is run over and over, which introduces also a bias.

So for now, I guess I will reprocess all the signal files with this new procedure, i.e., only randomizing the picking of the MB files.

To check that we do not see any of the previous structures (in the electron eta, for instance), I also checked this plot, which seems fine: eleceta_noseedRandFiles

caredg commented 2 years ago

This still no good enough. One can still see structure. More study pending....

caredg commented 2 years ago

Estoy tratando de hacer pruebas adicionales. Para descartar problemas con la generación de la señal, intenté generar DrelYan desde gen. Primero generé 200 eventos utilizando el cmsDriver y todos los switches exactamente como lo hacemos para LW: recoDY200 Este es el gráfico para eta de electrones: recoDY200_elec Es difícil de decir algo con tan pocos eventos. Así que generé 2000 en el cluster, con idéntica configuración. El gensim tomó como 16 horas, el hlt como 3 horas y el reco otras 3 horas. Mi preocupación es que el cortar la generación en porciones pequeñas de eventos esté generando alguna redundancia en el mixingmodule por lo que quiero ver cuál es el resultado generando más eventos de corrido (el framework se supone que debe poder lidiar con eso). Estos son los gráficos: recoDY2000 recoDY2000_elec Parecería ser que no existe estructura.

Voy a intentar generar LW con miles de eventos en una sola corrida para ver si hay cambios....

JonaJJSJ-crypto / Proyecto-de-Tesis

Figure out double bump in number of primary vertex distribution in signal #49