ast0815 / remu

ReMU - Response Matrix Utilities
MIT License
4 stars 1 forks source link

Large number of reco binning (2 variables with ~30,40 bins) lead to memory error when using the optimisation of truth binning #32

Closed ZardozSav closed 2 years ago

ZardozSav commented 3 years ago

Hi ,

When i try to use the method for merging the bins of the truth binning (or equivalent the method to compute the squared mahalanobis distance) i got Memory Error if my number of reco bins are to high.

As this method need to generate a number of random matrix wich depend of the number of reco bins, maybe the issue is coming from this.

Best regards,

ast0815 commented 3 years ago

Hi, could you provide a brief example of what exactly you did as well as the error message?

ZardozSav commented 3 years ago

​​​​​Hi,

I built a response matrix with the binning i have attached to this post, i tried then to optimized the truth binning with the method you developed "matrix_utils.improve_stats()" (i asked to have at-least 50 entries in each bins).

This optimization leaded to a memory error as if python created to many objects (with python2.7 or python3). I realized it's working well if i have less bins in the reco-binning. error.txt

reco-binning_Old_Lreco .txt true-binning.txt

ast0815 commented 3 years ago

Ok, yeah. The problem is that it needs to generate random variations of the matrices to calculate the Mahalanobis distance between them (i.e. to check whether they are compatible within statistical variations or not). It actually needs to generate more random throws than the number of reco bins, and it tries to generate them all at once. This is what leads to the memory problem when dealing with lots of reco bins.

It should be possible to change the algorithm in some way to avoid generating all these matrices at once. Do you need this at the moment, or do you see yourself needing this in the near future?

ZardozSav commented 3 years ago

If it is possible to do this at the moment , it would be very useful yes ! (however it's not really urgent like if it is done before the end of January this is already good). Because i want to test many models in order to check the estimation of model-dependencies .

Anyways i have a other question about the filling of the binning : if i have events from the simulation of my experiment which at the end is not reconstructed (the values of the reco variables will be NaN ) . When we call the fill method of the binning , these events will not be taken into account when filling the matrix because NaN ? How remu understand that these events needs to be taken into account because it is reflect the reconstruction efficiency of my experiment

Thank you for your response

Best regards,

ast0815 commented 3 years ago

I see. I will aim at fixing this next month then.

When you fill a response matrix it should actually fill a response binning and additionally a truth vector. Events that do not have a reco bin, will only end up in the truth vector. This is how it keeps track of efficiencies. You should be able to confirm this by comparing the content of the truth_binning with the reco_binning or response_binning of the ResponseMatrix object. I.e. the total of the truth should always be larger than or equal to the total in the reco, which should be identical to the total of the response.

ZardozSav commented 3 years ago

Thank you very much for the clarification !

ast0815 commented 3 years ago

I have just pushed some changes to the devel branch. Could you try it with your binning? Please also check whether it gives you the same results for the smaller binning as the old algorithm.

ast0815 commented 3 years ago

I also expect it to be slower than the old algorithm, could you compare the speed of it too?

ZardozSav commented 3 years ago

Hi, Sorry for the late reply, i saw your post and i'm actually doing some test. Indeed the speed of the algorithm is slower, my script is still running after 6 days of processing (using one cpu) while the precedent algorithm took less time (~ 10-24h).

ast0815 commented 3 years ago

With the same number of bins that worked before? Or is this with the number of bins that did not work before?

ZardozSav commented 3 years ago

With the number of bins that did not work before. Yes sorry the 10-24h was for the number of bins that worked, i'll compare the speed for these bins with your implementation

ast0815 commented 3 years ago

Ah, I see. Well, with that it is sure to take longer. How many times more bins does this have? For the speed comparison, only the same number of bins really makes it comparable. But at least it seems like the memory error is gone. I assume it failed quite a bit earlier before, when it was not working.

ZardozSav commented 3 years ago

Indeed the number of bins when i had the memory error is more than 10 times the number of bins when it was working so i assume it's normal to take longer. Anyway ,it seems the memory error is gone thank you very much !

I will check if the result is the same with the number of bins when it was working