Open fzeiser opened 3 years ago
Out of curiosity I checked whether I could read the file with h5dump
. That seems to work. But I can't read it in python:
[fabiobz@login-3.FRAM /cluster/projects/nn9464k/progs/gambit_np/Backends/installed/gledeli/1.0/plotting]$ python3 ../../../../../submit/runs/NuclearBit_162Dy_43_1/samples/results.hdf5
File "../../../../../submit/runs/NuclearBit_162Dy_43_1/samples/results.hdf5", line 1
SyntaxError: Non-UTF-8 code starting with '\x89' in file ../../../../../submit/runs/NuclearBit_162Dy_43_1/samples/results.hdf5 on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
This error does not appear for another, much shorter, test run I had on the same machine.
Given this message Received EMERGENCY shutdown message from process with rank 29
there might be a sensible error message in the default.log_29
file.
But a meeting might indeed be the easiest -- I'll send you an email right away :)
Well spotted, I should have realized this:
err-2404626-29
:
ESC[00;31;1mFATAL ERRORESC[00m
GAMBIT has exited with fatal exception: Error in function boost::math::erf_inv<double>(double, double): Overflow Error
It might well be from the prior I implemented... So either one goes for a full debug mode, or check in #6 / in d0f653df998
So I guess it's here: https://github.com/anderkve/gambit_np/blob/d0f653df998cdc8f280ec72255979940618be1cc/ScannerBit/src/priors/truncatedgaussian.cpp#L70-L74
But this matches also closely the implementation of the gaussian prior :confused:, which I assume works as expected. Could it be rounding errors calculating a and b or so? https://github.com/anderkve/gambit_np/blob/d0f653df998cdc8f280ec72255979940618be1cc/ScannerBit/include/gambit/ScannerBit/priors/gaussian.hpp#L62-L71
Seems like r=0; lower_cut = mu
-> erf_inv(0)
works fine, but erf_inv(1)
throws this type of error, which would happen for upper_cut=infty
[which is the same as: no upper cut]. Equivalently, for upper_cut = mu
and alpha = - infty
.
[Note: Mathematically, erf_inv(1) = infty
]
We could test/add these as special cases if indeed r=[0, 1]
and not (0,1)
. However, he lower and upper cuts were anyhow defined such that I don't see this happening immediately. Potentialy we could check the output with increased verbosity and a very easy likelihood, like spartan?
@anderkve: How did you retrieve the hypercube pars? Is it just this one in the yaml
file?
KeyValues:
debug: true # Set to true for very verbose log files
Turns out there was an option to print the hypercube parameters to the output, I just remembered the name wrong. It's print_unitcube: true
in the KeyValues
block. Check spartan.yaml
for an example.
To get the parameter point written to the log file (not in unit hypercube form, though) you need to run with the debug: true
option you mention.
(I suspect adding a couple of cout statements directly in the prior function might still be the easiest debugging approach, though.)
stupid question, but what cout
/ scanner_cout
... do I use to get things into the default
log?
In a regular GAMBIT module function you can do logger() << "your message" << EOM;
. See here for examples:
If you want to get the same message also printed to cout / cerr, you can do logger() << LogTags::repeat_to_cout << "your message" << EOM;
, and similarly with LogTags::repeat_to_cerr
.
Though, it might be that this logger stuff won't work the same way from within a prior transform function, because it's not a GAMBIT module function in the usual sense. I'd need to look into the details of that.
It works :)
The problem arises e.g. for following prior transformation:
prior: Truncated Gaussian
mu = 0
sigma = 1
lower_cut = 0
upper_cut = 50
UnitPars ["r"]: 1
lower_cut: 0
upper_cut: 50
So the hypercube includes here [0, 1]!
The calculations lead to [double checked with a python script I wrote, matching the cpp code]
alpha = (lower_cut - mu) / sigma
beta = (upper_cut - mu) / sigma
a = 0.5*(1 + erf(alpha/np.sqrt(2)))
b = 0.5*(1 + erf(beta/np.sqrt(2)))
a ≃ 0.5
b ≃ 1
and further
inner = 2*(r*(b-a) + a) - 1 # = 2b -1 ≃ 1
x = np.sqrt(2) * erfinv(inner)
x = x * sigma + mu
The argument of the inverse error function inner
is not exactly one, but up to machine precision. This leads to a problem in the inverse error function, it will go to infty
because of the machine precision issue. Maybe I could specify some cutoff for inner
+ add that for r=1 -> x= upper_limit
?
I have to double check this last one, but rather sure about it.
Actually from some tryouts, I think just adding two if-tests, for r=0
and r=1
would be sufficient.
Cool, thanks for investigating it! The if-tests sounds like a sensible fix.
Ok, so I could add these two cases, but then we may get the problem further down the line. For r=1
, the transformed value will be x=upper_cut
. This makes sense thinking about the inversion of the CDF, right :)? Anyhow, if the upper cut is not defined, it defaults to +infty
. So far, so good. But we don't want our parameter going into the likelihood calculations to be infinite :confused: .
Even though I've looked at the code of the gaussian prior, I cannot quite understand where it differs in this important point. There, whenever the unitcube parameter is 0
or 1
, you should get +- infty
, I assume. [Or from boost::math::erf_inv
one gets the error
message that I've seen above.]. However, does not seem to be the case as you would otherwise have certainly run into this issue before. :confused:
truncated gaussian https://github.com/anderkve/gambit_np/blob/d0f653df998cdc8f280ec72255979940618be1cc/ScannerBit/src/priors/truncatedgaussian.cpp#L103-L106
@anderkve: Any good ideas on this?
Of course, in principle it is correct that r=1
gives you x=infty
-- just that I never thought that a (truncated) gaussian distribution would actually ever return that value. For the truncated case, in which a real truncation exists, it somewhat better, because one could set e.g. x = upper_cut
.
So either we might modify gledeli to check that the parameters are finite -- or return already an error for the prior and redraw it, if that's possible.
I put in an if test, but still getting the same error. Somehow the comparison operator doesn't behave well:
double r = unitpars[0]; // input unit cube parameter
double x;
if (r==0){x = lower_cut;}
else if (r==1){x = upper_cut;}
else{
// Transformation:
logger() << "\n\nlower_cut: " << lower_cut << "\nupper_cut: " << upper_cut
<< "\n mu: " << mu << "\nsigma: " << sigma
<< "\nr: " << r << endl;
x = M_SQRT2*boost::math::erf_inv(2*(r*(b-a) + a) - 1); // output (result)
x = x * sigma + mu;
}
output[myparameter] = x;
leads again to the Error in function boost::math::erf_inv<double>(double, double): Overflow Error
error. The output from the logger is:
lower_cut: 0
upper_cut: 50
mu: 0
sigma: 1
r: 1
EMERGENCY_SHUTDOWN code broadcast to all processes
Which means that the r==1
case was not recognized. I thought that cout
in cpp
wouldn't convert numbers close to 1 to be represented by one, so that the logger r: 1
should have lead to r==1 --> True
. Hm :disappointed:
@anderkve: Do you have a suggestion on this one?
Hey -- apologies, I'm way behind on my backlog of GitHub notifications. :P
First, the cout
function can indeed convert numbers unless you specify the precision as in the example below. (I've made oh so many annoying mistakes due to this... :D )
I reproduced the overflow error with the stand-alone example below. Turns out the overflow error only happens when the difference |1.0 - r| is within the double precision, but not if r is interpreted as being equal to 1.0. A solution seems to be to perform a check using std::numeric_limits<double>::epsilon()
from <limits>
and nudge r
away from the problem region if necessary -- see below
#include <iostream>
#include <cmath>
#include <boost/math/special_functions/erf.hpp>
#include <limits>
int main()
{
double mu = 0;
double sigma = 1;
double lower_cut = 0;
double upper_cut = 50;
double alpha = (lower_cut - mu) / sigma;
double beta = (upper_cut - mu) / sigma;
double a = 0.5*(1 + erf(alpha/M_SQRT2));
double b = 0.5*(1 + erf(beta/M_SQRT2));
std::cout << std::setprecision(20) << " a = " << a << std::endl;
std::cout << std::setprecision(20) << " b = " << b << std::endl;
// double r = 0.999999999999999; // <-- works, interpreted as 0.99999...
// double r = 0.9999999999999998; // <-- works, interpreted as 0.99999...
double r = 0.9999999999999999; // <-- gives overflow error if not adjusted
// double r = 0.99999999999999994; // <-- gives overflow error if not adjusted
// double r = 0.99999999999999995; // <-- works, interpreted as 1.0
// double r = 0.99999999999999999; // <-- works, interpreted as 1.0
std::cout << std::setprecision(20) << " r = " << r << std::endl;
// This fix solves the overflow error
if (1.0 - r < std::numeric_limits<double>::epsilon())
{
r = 1.0 - 2 * std::numeric_limits<double>::epsilon();
std::cout << " Overflow fix! Adjusted r value:" << std::endl;
std::cout << std::setprecision(20) << " r = " << r << std::endl;
}
double x;
if (r==0)
{
x = lower_cut;
}
else if (r==1)
{
x = upper_cut;
}
else
{
x = M_SQRT2*boost::math::erf_inv(2*(r*(b-a) + a) - 1); // output (result)
x = x * sigma + mu;
}
std::cout << std::setprecision(20) << " x = " << x << std::endl;
std::cout << std::setprecision(20) << " std::numeric_limits<double>::epsilon() = " << std::numeric_limits<double>::epsilon() << std::endl;
return 0;
}
I am confused about the output of a run that I started. I've tried this type of job and it fails repeatably (or seems to fail). I will attach some debugging information, but it might be easiest if we could schedule a meeting @anderkve :pray: ?
I don't see a descriptive error message, but from the GAMBIT log, it seems like there might be a DMI communication problem. If you look a the timestamps, the job starts 11 April, 5 am and is killed 13 April, 5 am. However, the problem occurs already ~6 am on April 11th.
From
stdout
of process 0:stderr
:default.log_0
: