bat / bat

Bayesian analysis toolkit http://mpp.mpg.de/bat
68 stars 40 forks source link

ModelManager Segfault [Master] #92

Closed tquante closed 8 years ago

tquante commented 9 years ago

Dear Devs,

I have found an issue with the ModelManager which lead to a segfault. I have updated the HistogramFitter example with a second TF1 with the BG only hypothesis and tried to calculate the Bayes factor. The segfault apears after the prerun of the MCMC:

Summary : Opening logfile log.txt Summary : Marginalize using Metropolis Summary : Pre-run Metropolis MCMC for model "fitter_model_f1" ... Summary : --> Perform MCMC pre-run with 2 chains, each with maximum 10000 iterations Detail : * Running until at least 1500 iterations performed in prerun. Current iteration is 500 Detail : * Running until at least 1500 iterations performed in prerun. Current iteration is 1000 Summary : --> Set of 2 Markov chains converged within 1500 iterations, and all scales are adjusted. Detail : --> Average scale factors and efficiencies (measured in last 500 iterations): Detail : - Parameter : Scale factor Efficiency Detail : SignalYield : 28.87 % 38.3 % Detail : SignalMean : 28.87 % 26.3 % Detail : SignalSigma : 28.87 % 39.9 % Detail : BackgroundYield : 28.87 % 37.5 % Summary : Run Metropolis MCMC for model "fitter_model_f1" ... Summary : --> Perform MCMC run with 2 chains, each with 10000 iterations. * Break * segmentation violation

At first I thought that it is related with the MCMCInterface bug I reported last week, but it is the ModelManager which causes the problem.

I wrote an email with the source code to Oli Schulz, because I can not attach anything. I have the same issue with my analysis code which runs without any errors with BAT 0.9.4.

Hopefully this will help you to hunt the bug.

Regards Thomas

oschulz commented 9 years ago

Example file from Thomas:

histogramFitterExample_with_modelmanager.C.txt

fredRos commented 8 years ago

Dear Thomas,

I just ran your code with the latest version of BAT, see #98, and root 6.04.10 and multiple threads in BAT. I did not get a segfault. Which exact version of BAT did you use? It could well be that we have fixed your problem in the mean time since I fixed a large number of things in BCHistogramFitter and related fast fitter classes.

tquante commented 8 years ago

Dear Frederik,

I have tested your version and it seems that the model manager issue is fixed, but the CUBA integration leads to a "free an invalid pointer" issue. The only things i changed in the example mentioned above are

TH1D* ->TH1 TF1\ ->TF1

new BCHstogrammFitter( * hist,f1) new BCHstogrammFitter( * hist,f2)

BAT: your current fitter branch root: 6.00.02

What is the reason for using const TH1& instead of const TH1* ? I need to instantiate TH1* hist=new TH1X() anyway because the typical constructors of TH1 are protected or private

Regards

Thomas

fredRos commented 8 years ago

I have tested your version and it seems that the model manager issue is fixed, but the CUBA integration leads to a "free an invalid pointer" issue.

Ok, I don't get that error but I have seen it before. Might have something to do with the fact evaluating TF1 is not thread safe. I had to maneuver around that and create independent copies, one for each Markov chain. Could you check if the error persists when setting

CUBACORES=1 root -b -q -l histogramFitterExample.C 

The only things i changed in the example mentioned above are

TH1D* ->TH1 TF1\ ->TF1

new BCHstogrammFitter( * hist,f1) new BCHstogrammFitter( * hist,f2)

BAT: your current fitter branch root: 6.00.02

What is the reason for using const TH1& instead of const TH1* ? I need to instantiate TH1* hist=new TH1X() anyway because the typical constructors of TH1 are protected or private

As part of making the fast fitters (includes BCHistogramFitter) thread and memory safe, I started to keep an independent copy of the histogram, so there is no concern about who needs to free the memory. To make this obvious, I expect a reference which in addition cannot be nullptr. For the user convenience, I take a TH1& which could refer to a TH1D but conceptually the counts need to be integer, so when I copy the member is TH1I fHistogram;. In the constructor

BCHistogramFitter::BCHistogramFitter(const TH1& hist, const TF1& func, const std::string& name)
    : BCFitter(func, name)
{
    hist.Copy(fHistogram);
...

fHistogram is default initialized, then I use TH1::Copy that does all the nasty type checks. Strangely enough, even the TH1I stores the count as a double. I guess that's ROOT style :smile:

Regarding your second comment: I would just do

TF1 f(...);
TH1I hist(...);
...
BCHistogramFitter hf(f, hist);

and there is no problem with TH1 being abstract.

tquante commented 8 years ago

The following command still lead to a free of invalid ptr when the second integration result is plotted. CUBACORES=1 root -b -q -l histogramFitterExample_with_modelmanager.C+

I think it has something to do with the termination of the single CUBA thread. Afterwards the programm runs till the end and then root throws another free of invalid ptr error.

The TH1I has only an 32bit integer array not a double, in case someone has more than 2147483647 entries in one bin it will fail, this can happen in case of huge detector arrays during calibrations. Also scaling of such histograms lead to accuracy problems. Therefore, all of our histograms are floats or even doubles.

oschulz commented 8 years ago

The TH1I max value of 2147483647 bin bin seems a limitation to me, too - it it can easily be exceeded in actual use cases. As ROOT doesn't seem to have an 64-bit integer histogram type, we shouldn't convert TH1D histograms to TH1I.

greenwald commented 8 years ago

To be clear, since I think Fred had a typo in his message that is causing some confusion. We allow the user to pass any type of TH1, including TH1I, but we convert it to TH1D for use in BAT.

On Fri, Nov 27, 2015, 10:47 Oliver Schulz notifications@github.com wrote:

The TH1I max value of 2147483647 bin bin seems a limitation to me, too - it it can easily be exceeded in actual use cases. As ROOT doesn't seem to have an 64-bit integer histogram type, we shouldn't convert TH1D histograms to TH1I.

— Reply to this email directly or view it on GitHub https://github.com/bat/bat/issues/92#issuecomment-160098019.


Daniel Greenwald Physik-Department E18 Technische Universität München James-Franck-Str. 1 85748 Garching Germany

daniel.greenwald@tum.de tel : +49 89 / 289 12569 fax : +49 89 / 289 12570

fredRos commented 8 years ago

The following command still lead to a free of invalid ptr when the second integration result is plotted. CUBACORES=1 root -b -q -l histogramFitterExample_with_modelmanager.C+

I think it has something to do with the termination of the single CUBA thread. Afterwards the programm runs till the end and then root throws another free of invalid ptr error.

@tquante what version of root did you use? I installed root 5.34.19 today and started to also see the same errors. I did not see them with 5.34.25 or 6.04.10. So I guess it's some bug related to a pretty old version of root

tquante commented 8 years ago

Sry for the late answer. I currently use root 6.00.02. I will check if the problem is still existing for me if I use root 6.04.10.

fredRos commented 8 years ago

So just for completeness: I transferred the invalid pointer thing to a separate issue #127.

The original problem has long been solved, so I closed the issue.

To be clear, since I think Fred had a typo in his message that is causing some confusion. We allow the user to pass any type of TH1, including TH1I, but we convert it to TH1D for use in BAT.

It was not typo, for a brief period we had TH1I in the code but upon your request I reverted to a TH1D inside BCHistogramFitter,

tquante commented 8 years ago

Thank you very much for fixing this bug! Now I can start again working on the Bat 1.0 compatible version of my analysis.

On 12/10/2015 03:49 PM, Frederik Beaujean wrote:

So just for completeness: I transferred the |invalid pointer| thing to a separate issue #127 https://github.com/bat/bat/issues/127.

The original problem has long been solved, so I closed the issue.

To be clear, since I think Fred had a typo in his message that is
causing some confusion. We allow the user to pass any type of TH1,
including TH1I, but we convert it to TH1D for use in BAT.

It was not typo, for a brief period we had |TH1I| in the code but upon your request I reverted to a TH1D inside |BCHistogramFitter|,

— Reply to this email directly or view it on GitHub https://github.com/bat/bat/issues/92#issuecomment-163646267.

Thomas Quante Technische Universität Dortmund Experimentelle Physik IV Otto-Hahn-Str. 4 D-44227 Dortmund Germany

email: thomas.quante@tu-dortmund.de phone: +49 231 755 3690 fax: +49 231 755 3688