DL2 variables to use - Githubissues

orelgueta commented 3 years ago

A few points that might require Gernot's input (add to this list as we go):

tgrad_x (gradient in samples/deg along long-axis of image) doesn't seem to be filled in the DL2 file. I see only -99 in all entries. It might be important as this parameter is used in the ED BDT. (I think it is used squared, see here)
We are missing the "asym" variable which is used in the training I think.
The ED BDT uses the width/length values in the training. I am not sure how it is done, since these vectors are per telescope with image. That means the vector length changes for each event. How does the TMVA BDT deal with that? It would be good to use the same inputs in our regressors instead of the reduced width/length.
In the BDT input variables there is also a variable called "wol". I assume it is the width over length, ask for confirmation and add it as a variable in our training?
The TMVA code used for angular resolution says it uses one MVA per telescope type. I do not understand this, each type, LST, MST, SST has a separate MVA? The results of each are combined afterwards then? Why is it done like that? Is it a way to deal with the variable vector length per event?

GernotMaier commented 3 years ago

A few points that might require Gernot's input (add to this list as we go):

* tgrad_x (gradient in samples/deg along long-axis of image) doesn't seem to be filled in the DL2 file. I see only -99 in all entries. It might be important as this parameter is used in the ED BDT. (I think it is used squared, see [here](https://github.com/Eventdisplay/Eventdisplay/blob/771ed53460f69f870f85022147f992cfa28e539b/src/trainTMVAforAngularReconstruction.cpp#L162))

* We are missing the "asym" variable which is used in the training I think.

Both variable tgrad_x and asym are filled now.

* The ED BDT uses the width/length values in the training. I am not sure how it is done, since these vectors are per telescope with image. That means the vector length changes for each event. How does the TMVA BDT deal with that? It would be good to use the same inputs in our regressors instead of the reduced width/length.

I am filling an intermediate tree with all the training variables and after some quality cuts. This way, I avoid the problem with variable lengths.

On DESY, you can find files e.g., here: /lustre/fs22/group/cta/users/maierg/analysis/AnalysisData/prod5-Paranal-20deg-sq08-LL/DISPBDT/BDTdisp.S.BL-4LSTs25MSTs70SSTs-MSTF.T1/BDTDisp/0deg

* In [the BDT input variables](https://github.com/Eventdisplay/Eventdisplay/blob/771ed53460f69f870f85022147f992cfa28e539b/src/trainTMVAforAngularReconstruction.cpp#L162) there is also a variable called "wol". I assume it is the width over length, ask for confirmation and add it as a variable in our training?

Yes, this is width over length. Not clear to me if the machine learner realise that they can derive wol from width and length.

* The [TMVA code](https://github.com/Eventdisplay/Eventdisplay/blob/771ed53460f69f870f85022147f992cfa28e539b/src/trainTMVAforAngularReconstruction.cpp) used for angular resolution says it uses one MVA per telescope type. I do not understand this, each type, LST, MST, SST has a separate MVA? The results of each are combined afterwards then? Why is it done like that? Is it a way to deal with the variable vector length per event?

Yes, training is per telescope type. Simple reason is that many values like width, length (in deg) or image size (in digital counts) are telescope type dependent.

Eventdisplay calculates from each image an event direction (x,y) and then calculates in a second step a weighted mean of all participating events.

orelgueta commented 3 years ago

Thanks Gernot!

cta-observatory / iact_event_types

DL2 variables to use #6