Eventdisplay / Converters

Repository for Converters from Eventdisplay format into FITS / HDF5 / etc
BSD 3-Clause "New" or "Revised" License
1 stars 2 forks source link

Files contain multiple runs but only one run header #10

Closed maxnoe closed 4 years ago

maxnoe commented 4 years ago

Checking the root files, it seems only a single run header is written but there are multiple runs in the events tree.

Is there any way to check how many runs were merged? Assuming all have the same run header?

It might happen that a run has no triggered / surviving events so just looking into the events tree, especially for cut level 2 might not give the correct answer.

TarekHC commented 4 years ago

No clue! This is certainly a question for @GernotMaier ... :D

GernotMaier commented 4 years ago

There is only one run header included here (the analysis chain would not work with varying run headers between files).

I am counting the MC events in the first stage of the analysis, not relying at all on the MC run headers. Trusting headers is dangerous, as an aborted simulation file might provide a readable simulation file, just with less events as stated in the header.

If you want to get the histogram of thrown events, this is stored in the fEffAreaTree:

TH1D *hEMC = 0;
fEffArea->SetBranchAddress("hEmc", &hEMC );
fEffArea->GetEntry( 0 );

Screenshot 2020-09-15 at 12 20 42

There is only one entry per tree for CTA (just in case you wonder why it is a tree).

maxnoe commented 4 years ago

Trusting headers is dangerous, as an aborted simulation file might provide a readable simulation file, just with less events as stated in the header.

This is not checked? For my own CORSIKA processing I found out, that the correct way to check if the CORSIKA run is successfull is to check if there is a "END OF RUN" message in the log output.

The histogram looks like it has an unsuitable binning (the last bin with events seems under populated, so probably the binning goes further than the maximum energy of the simulations)

GernotMaier commented 4 years ago

Don't think anyone checks this. There are always a couple of runs which are not processed correctly (agree with you that this should not be the case).

Not sure if the binning matters - and obviously it is larger than the MC energy range (10^4 TeV)

maxnoe commented 4 years ago

There is now an issue to check if the CORSIKA run was successfull on the grid: https://github.com/cta-observatory/CTADIRAC/issues/16

So how would you go about calculating event weights? Taking the total number of events from the histogram but spectral index and e_min / e_max from the run header?

GernotMaier commented 4 years ago

Yes.

maxnoe commented 4 years ago

The bins in the histogram contain non-integer counts. Could you explain where this comes from?

In [1]: import uproot4

In [2]: f = uproot4.open('./gamma_cone.Nb.3AL4-BN15-TI_ID0.eff-0.root')

In [3]: hist, edges = f['hEmc;1'].to_numpy()

In [4]: for entry in hist[0:5]:
    ...:     print(f'{entry:.4f}')
    ...: 
1026208101.2765
726638892.4260
514463389.9338
364172401.7626
257886506.0212
GernotMaier commented 4 years ago

To answer this: hEmc histograms are filled with a spectral weight assuming a spectral index of 2.5.

Is this really a problem? These files are from Oct 2019, and I would have to go back to this old Eventdisplay version, modify it, run the the writing of root files again, etc.

Would probably be easier if you take this into account, as this is a simple weighting factor.

maxnoe commented 4 years ago

We need the correct number of thrown events. What is the best way to get this?

I'm not sure that calculating event weights with -2.5 and the sum of this histogram woudl be correct, let me think about this.

GernotMaier commented 4 years ago

Just went through everything again - I'll simply don't have that number at the point where the event trees are written out. I modified the code now, but would have to rerun the analysis.

There is no change that we can move this to a prod5 analysis? Above event files are almost a year old and it is really hard to go back.

maxnoe commented 4 years ago

Using prod5 would be fine with me, we just need a well defined dataset and a reference IRF/Sensitivity to compare to.

GernotMaier commented 4 years ago

I've added now a histogram of unweighted events and applied some changes to allow the processing of prod5.

One file can be found here, could someone check if this is ok before I process all the others?

https://desycloud.desy.de/index.php/s/nfnrFL85foK35xC

maxnoe commented 4 years ago

The file looks ok.

Two things come to my mind before starting the full new production

GernotMaier commented 4 years ago

Great, new version here: https://desycloud.desy.de/index.php/s/6wwSeLe8eR2i5z8

The histogram of simulated events vs distance to the optical axes is not carried through the different stages, so would be great if you could use the run header.

maxnoe commented 4 years ago

Ok, should be fine

maxnoe commented 4 years ago

Thanks!