Open caredg opened 3 years ago
I have modified the scripts in the condor folder in the AcausalPOETAnalyzer. I am waiting for @JonaJJSJ-crypto to put a limit on the tracks so they do not overflow the ntuples' size. This is urgent....
I have tested how change the size in the Output of the Analyzer for MC and Data .root files. the cut for pt>5GeV is already loaded in the analyzer
Ok, so if we stick to the the 5GeV cut, compared to 4.3MB, that is a reduction of 93% in the total size. If we applied this to the diphoton 2012B dataset, which weights 6.3 TB, the total size of the ntuple for this example dataset would be about 440 GB. This is still too large! We need to get close to tens of GB max for this to be manageable. What is the fraction of the weight that tracks are taking after this cut? Should we make a cut on a different collection or should we cut harder on tracks? Is this permisible or do we really need them to get the secondary vertices?
I didn't check the actual reduction from a AOD file. A few moments ago I ran this test, in my case I reduced a 2.3GB AOD file to 17MB 0.7% reduction in size. This means that the output from the 6.3TB should be about 46GB. Is this size ok? or perhaps I should reduce this further?. All the other classes lack of cuts but I will recommend, if needed, to set a cuts in pt>5GeV as in other skimming like the one from Stephan. The other possible cuts I recommend that still produce secondary vertexes are 7GeV & 10GeV, and no further because the number of secondary vertices drastically decrease.
I made a test running the acausal POET with the 5GeV cut on trakcs ofver 5 root files (5 jobs) from the diphoton Run2012B dataset. The average output file (job output) weights ~25MB. If running over all 1612 files produced the same average output file, the total weight for this datset would be 1612*25MB ~ 43GB. Now, if the average size for diphoton Run2012 is the same, running over the 2719 files that this dataset has would produce ~68 GB. This is probably not the end of the world but it would be good to decrease it a bit more. I will try making a test with a cut of 10 GeV.
Forgot to mention that the reduction in size agrees with this, i.e., 0.7%.
A cut with 10 GeV does not make a lot of difference. Maybe we need to think about restricting events based on other objects, like photon pT, for example. However, this needs to be done as an EDFilter after looking at the final ntuple variables from the signal simulation (so we do not shoot ourselves in the foot). I will send these jobs to prepare signal ntuples ASAP.
I have implemented cuts in pt for Electrons and photons too but the percentage reduces only to 0.65%, cuts in pt>10 for tracks is not a significant change neither. I also check if is possible to reduce size in other classes such as vertex yet this not seems to be a problem.
I have implemented an EdFilter for cutting in events that lack off energetic electrons (pt>17) and a cut in eta that is still under testing (eta<2.1). This Produced a reduction of a 2.3GB data file into 5.7MB skimmed file. This correspond to a 0.2% reduction in the size meaning that 6.3TB could be reduced to around 15MB. The Downside of this is that 1 of my 100 simulated events was also cutted off. Meaning 1% of true events.
Ok, that is not good. This cuts should not reduce the signal at all..... we should really look at the signal distributions first.... They are coming....
It seems that the problem was the cut in eta. I have remove this cut and this fix the problem with the simulated data. However the reduction in real data is about 2.5% wich means that 6.3TB should be reduced around 20.5GB.
I am testing with the latest AcausalPoet ntuplizer with 5 jobs over the TTbar sample. All the test and official output will live under this with the same password as usual. If successful, I will launch jobs for full
Test seemed successful. Submitting full jobs for the datasets mentioned....
Ntuples are ready here under the directory signalStudy_round1
. There, one can find the merged root files:
@JonaJJSJ-crypto the new LWSM200DnR.root merged ntuple root file is in signalStudy_round4
in the same place
@JonaJJSJ-crypto, we need to prepare the close to final poet ntuplizer. We need to:
@JonaJJSJ-crypto, at the usual place and in the cajuela
disk, I am copying the newly produced root files in the analysis_round1
directory. These ntuples were produced with the simple electron/jet filter and with a cut of 5GeV for writing out tracks, nothing more. So far we have:
@JonaJJSJ-crypto los backgrounds están casi completos. Solo fataría TTbarZ, pero la sección eficaz parece despreciable, no?
Creo que comparado con los grandes si seria despreciable
@JonaJJSJ-crypto After this, I think I am ready to reprocess everything. I will start by running the signal simulation again, but make sure you have what you need in the analyzers that make up the ntuples. Let me know
@caredg I would like to end testing all the things I had planned to solve our trigger object problem #52
@JonaJJSJ-crypto, the new ntuples (except for signal 400 ad 500, which are still being produced) can be found at the same place under the analysis_round3
directory. A similar location can be found in the cajuela
repository.
@JonaJJSJ-crypto
@JonaJJSJ-crypto
* [ ] Pendiente resolver si aplicar o no el filtro de trigger (creo que esto no necesitamos a menos que volvamos a tener problemas de tamaño) * [x] Introducir [estos](https://github.com/JonaJJSJ-crypto/Proyecto-de-Tesis/issues/43#issuecomment-964330463) cortes para el pT de electrones y jets. * [ ] Añadir el algoritmo de pre-selección de vértices secundarios * [x] Cambiar la fuente de vertices primarios a la que toma en cuenta el beam spot? [Check if we suffer from beam spot issues #39](https://github.com/JonaJJSJ-crypto/Proyecto-de-Tesis/issues/39)
@caredg hasta ahora los se agregaron los cortes en pt de electrones y jets. Y Se agrego los vertices primarios del beamspot sin embargo se conservaron tambien los antiguos vertices primarios. En cuanto al algoritmo de merging de vertices secundarios se esta implementando un nuevo metodo para distinguir la distancia ede los vertices al momento de hacer el merge.
@JonaJJSJ-crypto, he producido ntuplas para LWSM200DnR y DYJetsToLL. El código necesita ahora un wall time de 4 horas para que todos los trabajos se completen, sobretodo el sample de LW. Si bien el sample de LW pesa menos que el anterior, el de DY pesa más. Los archivos se los puede encontrar en el lugar de siempre en el directorio analysis_round4
. No me puedo conectar al minicluster para copiarlos ahí. Veré que es lo que pasa mañana.
@caredg yo tengo el mismo problema con el minicluster. Parece que aun no esta el archivo de LW en ninguna carpeta.
@caredg he terminado de chequear que la aplicacion del filtro con trigger es viable. El estudio es el siguiente, se selecionaron aquellos eventos cuyo electron de mayor energia supera los 40GeV en pt y en segundo mas energetico dupera los 25GeV en pt. De estos eventos, se cuenta aquellos que pasaron y no pasaron el trigger y estos son los resultados que obtuve.
@JonaJJSJ-crypto, no etiendo muy bien. ¿Cuántos eventos, de los 150K originales, pasan la selección de 40 y 25 GeV? ¿Es 120079, i.e., 9147+110932, cierto? Y de estos 120079, pasan el trigger 110932, cierto? Es decir, pasan el 92.38% (supongo que eso da 7.6% de que no pasan). Ok, si es así, coincide con la "eficiencia" que se reporta acá Es decir, parece que si es la eficiencia....
@caredg Si justamente es lo que tu dices. Y si parece que esa es la eficiencia. Es necesario correr este analysis con los datos?
@JonaJJSJ-crypto, no, no con datos. En datos habría que hacer todo el proceso de tag and probe para estimar esto, pero hay tiempo no hay tiempo, así que tomaremos los datos de la literatura.
@JonaJJSJ-crypto, no hay tiempo, quise decir
I am removing all the signalStudy_round?
directories from our repository to free some space.
@JonaJJSJ-crypto estoy produciendo las ntuplas que incluyen:
analysis_round5
en los mismo sitios de siempre.Voy a eliminar analysis_round1
y analysis_round2
del repositorio principal para ganar espacio.
Las ntuplas que incorporan los discutido aquí están en la carpeta analysis_round6
en los repositorios usuale.
@JonaJJSJ-crypto, las ntuplas para LW200 y DY solicitadas aquí se encuentran en analysis_round7
en los repositorios usuales.
@JonaJJSJ-crypto, nuevas ntuplas para LW200 y DY solicitadas en encuentran en analaysis_round8
en los repositorios usuales
@JonaJJSJ-crypto, las nuevas ntuplas están listas en analysis_round11
en los repositorios usuales.
Perfecto ya me pongo a generar los nuevos gráficos
Ntuplize collisions data and mc simulations