Function with nan value

thomasdes commented 9 years ago

Hi, I have a black block function and sometimes it can return a nan value. Does libcmaes handle this ?

In order to fix this problem, if the value is nan, I return max double. The problem is the best solution value returned by libcmaes is max double, whereas during the evaluation function, I see on my screen that it value it to some better point. Any idea why libcmaes return this max value whereas it values the function at better points ?

I use cmaes algorithm.

Thank you.

beniz commented 9 years ago

I've used a similar strategy before and it should work fine.

Though try to return max double / 10 or any significantly lower value to see whether it changes anything. This should allow to see whether the problem may be caused by using max double in internal operations (though the algorithm is rank-based so it really should not matter).

nikohansen commented 9 years ago

I usually implement the convention that NaN leads to the immediate generation of another sample with replacement, such that in the end popsize samples with non-NaN values are available for ranking.

thomasdes commented 9 years ago

It doesn't work with lower values. nikohansen : you do it with a progress function and if the value is nan, you reset the cmaSolution ?

beniz commented 9 years ago

@nikohansen yes, this something I could add. However, it comes with the side-effect that it may require an undefined number of samples in order to cover the lambda samples. I've encountered this in an application from ROOT where the fitting function was undefined in some (unknown beforehand) areas: the objective function would trigger NaN values for many iterations.

@thomasdes if you generate samples with values lower than max double, it should work. Can you share pieces of your code like last time please ?

nikohansen commented 9 years ago

However, it comes with the side-effect that it may require an undefined number of samples in order to cover the lambda samples.

That's not really a problem, because in this case the optimization would fail either way (it will actually fail much easier without rejection sampling). I do indeed trigger warnings each time I have seen 1000 NaN samples. I do agree that this is a hack which should be exploited with care (e.g. it often will fail if used for constraint handling).

nikohansen commented 9 years ago

you do it with a progress function and if the value is nan, you reset the cmaSolution

I don't quite know what you mean by progress function. I tried to describe rejection sampling until the sample/solution gets assigned a non-NaN value.

thomasdes commented 9 years ago

KalmanFilter.cpp (interesting parts) :

#include <stdexcept>
#include <ctime>
#include <gsl/gsl_multimin.h>

#include "KalmanFilter.h"

using namespace Eigen;
using namespace std;
using namespace libcmaes;

int ITER_MAX = 1000000;

KalmanFilter::KalmanFilter(Scenario *scenario, int m_nbParam): m_scenario(scenario), m_nbParam(m_nbParam) {
    m_nbMaturity = scenario->getMaxWeekMaturity() + 1;
    m_nbDate = scenario->getNbDates();
    m_nbProduct = scenario->getCurve(0).row(0).cols();

    m_x = VectorXd::Zero(m_nbMaturity);
}

void KalmanFilter::initialisation(MatrixXd param, MatrixXd R){
    if (param.rows() != m_nbMaturity){
        throw invalid_argument("Parameter matrix has not the right number of rows");
    }
    if (param.cols() != m_nbMaturity){
        throw invalid_argument("Parameter matrix has not the right number of cols");
    }

    m_Q = param;
    m_P = m_Q;

    m_F = SparseMatrix<double>(m_nbMaturity, m_nbMaturity);
    m_var = VectorXd(m_nbMaturity);
    m_varFriday = VectorXd(m_nbMaturity);

    for (int i = 0 ; i < m_nbMaturity - 1 ; ++i){
        m_F.insert(i, i + 1) = 1;
        m_var(i) = - m_Q(i,i) / 2;
        m_varFriday(i) = - m_Q(i + 1, i + 1) / 2;  
    }
    m_F.insert(m_nbMaturity - 1, m_nbMaturity - 1 ) = 1;
    m_varFriday(m_nbMaturity - 1) = 0;
    m_var(m_nbMaturity - 1) = - m_Q(m_nbMaturity - 1, m_nbMaturity - 1) / 2;
    m_R = R;
}

void KalmanFilter::updateState(int indTime){
    if (m_scenario->isMonday(indTime)){
        m_P = m_F * m_P * m_F.transpose() + m_Q;
        m_x = m_F * m_x + m_varFriday;
    }
    else{
        m_P += m_Q;
        m_x += m_var;
    }
}

void KalmanFilter::updateMeasure(int indTime){

    if (!m_scenario->missData(indTime)){

        SparseMatrix<double> H = computeH(indTime);

        VectorXd temp(m_nbProduct);
        for (int i = 0 ; i < m_nbProduct ; ++i){
            temp(i) = m_scenario->getCurve(indTime).row(2)(i) / m_scenario->getCurve(0).row(2)(i);
        }
        m_r = temp - H * VectorXd::Ones(m_nbMaturity);

        MatrixXd HP = H * m_P;
        MatrixXd HPt = HP.transpose();

        MatrixXd G = H * HPt + m_R;
        FullPivLU<MatrixXd> GLU = G.fullPivLu();
        m_Ginv = GLU.inverse();
        m_Gdet = GLU.determinant();
        MatrixXd K = HPt * m_Ginv;
        m_x += K * m_r;
        m_P -= K * HP;
    }
}

//SparseMatrix<double> KalmanFilter::computeH(int indTime){
//  SparseMatrix<double> H(m_nbProduct, m_nbMaturity);
//  MatrixXd data = m_scenario->getCurve(indTime);
//
//  for (int indProduct = 0 ; indProduct < m_nbProduct ; ++indProduct){
//      for (int indMaturity = data(0, indProduct) ; indMaturity < data(1,indProduct) ; ++indMaturity){
//          H.insert(indProduct, indMaturity) = exp(m_x(indMaturity)) / (data(1,indProduct) - data(0,indProduct)) ;
//      }
//  }
//
//  return H;
//}

SparseMatrix<double> KalmanFilter::computeH(int indTime){
    SparseMatrix<double> H(m_nbProduct, m_nbMaturity);
    MatrixXd data = m_scenario->getCurve(indTime);
    int beginWeek = m_scenario->getBeginWAH(indTime);

    for (int indProduct = 0 ; indProduct < m_nbProduct ; ++indProduct){
        int timeDuration = data(1,indProduct) - data(0,indProduct);
        int beginMaturity = floor((data(0,indProduct) - beginWeek)/7) + 1;
        int endMaturity = floor((data(1,indProduct) - beginWeek)/7) + 1;

        H.insert(indProduct, beginMaturity) = exp(m_x(beginMaturity)) * 
            (7 * beginMaturity + beginWeek - data(0,indProduct)) / timeDuration ;           

        for (int indMaturity = beginMaturity + 1; indMaturity < endMaturity ; ++indMaturity ){
            H.insert(indProduct, indMaturity) = exp(m_x(indMaturity)) * 7 / timeDuration ;
        }
        H.insert(indProduct, endMaturity) = exp(m_x(endMaturity)) * 
            (data(1,indProduct) - (beginWeek + (endMaturity - 1) * 7) ) / timeDuration ;

    }
    return H;
}

double KalmanFilter::loglikehood(MatrixXd param, MatrixXd R){
    double llh = 0;
        //clock_t start = clock();

    initialisation(param, R);
    for (int indTime = 0; indTime < m_nbDate ; ++indTime){
        updateState(indTime);
        updateMeasure(indTime); 
        llh += log(m_Gdet) + m_r.transpose() * m_Ginv * m_r;
    }
            //cout << (clock() - start) / (double) CLOCKS_PER_SEC <<endl;
    //cout << llh << endl;
    if(!std::isnan(llh)){
        return llh;
    } 
    else{
        return 100000;
    }
}

outputOptimisation KalmanFilter::maximizeLikeHoodGen(MatrixXd &parametersIni, MatrixXd &RIni){

    //function to optimize and number of parameters
    double sigma = 1;
  //int lambda = 100; // offsprings at each generation.

    //initialisation
    std::vector<double> x0(m_nbParam * m_nbMaturity + m_nbProduct ,0.0);
    for (int i = 0 ; i < m_nbParam ; ++i){    
        for (int j = 0 ; j < m_nbMaturity ; ++j){
            x0[i * m_nbMaturity + j] = parametersIni(i,j);
        }
    }

    for (int i = 0 ; i < m_nbProduct ; ++i){
        x0[m_nbParam * m_nbMaturity + i] = RIni(i,i);
    }

    CMAParameters<> cmaparams(x0,sigma);
    cmaparams.set_str_algo("acmaes"); 
    cmaparams.set_mt_feval(false);
    CMASolutions cmasols = cmaes<>(loglikehoodGen,cmaparams);
    cout << cmasols << endl;
    vector<double> argmin = cmasols.best_candidate().get_x();
    double min = cmasols.best_candidate().get_fvalue();

    MatrixXd parameters(m_nbParam, m_nbMaturity);
    MatrixXd R = MatrixXd::Zero(m_nbProduct,m_nbProduct);
    for (int i = 0 ; i < m_nbParam ; ++i){
        for (int j = 0 ; j < m_nbMaturity ; ++j){
            parameters(i, j) = argmin[i * m_nbMaturity + j];
        }
    }

    for (int i = 0 ; i < m_nbProduct ; ++i){
        R(i,i) = argmin[m_nbParam * m_nbMaturity + i];
    }

    outputOptimisation output;
    output.R = R;
    output.parameters = parameters;
    output.loglikehood = - min / 2;
    output.AIC = (m_nbParam * m_nbMaturity + m_nbProduct) + min;
    output.BIC = min + log(m_nbDate * m_nbProduct) * (m_nbParam * m_nbMaturity + m_nbProduct);

    return output;
}

thomasdes commented 9 years ago

In KalmanFilter.h the lambda function

        libcmaes::FitFunc loglikehoodGen = [this](const double *x, const int N){
            Eigen::MatrixXd volCurve(m_nbParam, m_nbMaturity);
            Eigen::MatrixXd R = Eigen::MatrixXd::Zero(m_nbProduct,m_nbProduct);
            for (int i = 0 ; i < m_nbParam ; ++i){
                for (int j = 0 ; j < m_nbMaturity ; ++j){
                    volCurve(i, j) = x[i * m_nbMaturity + j];
                }
            }

            for (int i = 0 ; i < m_nbProduct ; ++i){
                R(i,i) = x[m_nbParam * m_nbMaturity + i];
            }

            return loglikehood(volCurve.transpose() * volCurve, R);
        };

thomasdes commented 9 years ago

I can update all the code if you want but data are needed in input that I am not allowed to provided. If these parts of codes are not sufficient, I can upload the code and create fake data.

beniz commented 9 years ago

That's not really a problem, because in this case the optimization would fail either way (it will actually fail much easier without rejection sampling).

I see. With enough incentive for it, I could add a rejection sampling scheme, and expose it to the user as needed.

beniz commented 9 years ago

@thomasdes several things:

can you show a trace with values, typically by using:

cmaparams.set_quiet(false);

or any other way so that we can see the values that are below the value returned instead of NaN ?

is it possible to reproduce the problem with dummy data that you would be able to share along with the code ?
is this normal you get NaN values ? It seems it is likely that it is coming from

log(m_Gdet)

in the log-likelihood function.

thomasdes commented 9 years ago

Yes, the nan comes from log(det), it can happen that G is non inversible but I don't have a lot of control on that : it depends on R which is input but I don't know for which values of R G(R) is non invertible.

INFO - CMA-ES / dim=59 / lambda=16 / sigma0=0.1 / mu=8 / mueff=5.09618887861017 / c1=0.000549271554997287 / cmu=0.00176721897551803 / lazy_update=0 / threads=2 INFO - iter=1 / evals=16 / f-value=109.62502002925 / sigma=0.0952579878576507 / last_iter=2717 INFO - iter=2 / evals=32 / f-value=-156.541165140858 / sigma=0.0921816226430627 / last_iter=2755 INFO - iter=3 / evals=48 / f-value=-256.170130987588 / sigma=0.0904229767009231 / last_iter=2754 INFO - iter=4 / evals=64 / f-value=-291.179119495875 / sigma=0.0898983052516064 / last_iter=2798 INFO - iter=5 / evals=80 / f-value=-514.220062604344 / sigma=0.0901516163146536 / last_iter=2732 INFO - iter=6 / evals=96 / f-value=-810.368709987719 / sigma=0.0913203643433735 / last_iter=2780 INFO - iter=7 / evals=112 / f-value=-839.052302763376 / sigma=0.0923589573947146 / last_iter=2775 INFO - iter=8 / evals=128 / f-value=-1112.03242232967 / sigma=0.0940662323496986 / last_iter=2816 INFO - iter=9 / evals=144 / f-value=-1178.57817622726 / sigma=0.0961674161421779 / last_iter=2785 INFO - iter=10 / evals=160 / f-value=-1647.62331143761 / sigma=0.0987900867521246 / last_iter=2781 INFO - iter=11 / evals=176 / f-value=100000 / sigma=0.100736788004335 / last_iter=2680 INFO - iter=12 / evals=192 / f-value=100000 / sigma=0.102148944209716 / last_iter=2727 INFO - iter=13 / evals=208 / f-value=100000 / sigma=0.102722390926307 / last_iter=2721 INFO - iter=14 / evals=224 / f-value=100000 / sigma=0.103190527334962 / last_iter=2731 INFO - iter=15 / evals=240 / f-value=100000 / sigma=0.102898287330751 / last_iter=2720 INFO - iter=16 / evals=256 / f-value=100000 / sigma=0.102635312079438 / last_iter=2708 INFO - iter=17 / evals=272 / f-value=100000 / sigma=0.103562359983106 / last_iter=2676 INFO - iter=18 / evals=288 / f-value=100000 / sigma=0.105242858727255 / last_iter=2723 INFO - iter=19 / evals=304 / f-value=100000 / sigma=0.106864181776331 / last_iter=2702 INFO - iter=20 / evals=320 / f-value=100000 / sigma=0.108231605668602 / last_iter=2741 INFO - iter=21 / evals=336 / f-value=100000 / sigma=0.108419180532545 / last_iter=2741 INFO - stopping criteria stagnation => oldmedianfvalue=50148.1542972347 / newmedianfvalue=100000 best solution => f-value=100000 / fevals=336 / sigma=0.108419180532545 / iter=21 / elaps=57570ms / x= 0.777657556424017 -0.338837384877716 -0.0415580097693238 -0.257027363482036 -0.0896423486591425 0.00222543677038062 -0.0560198487305444 -0.238702081655937 0.249018942037588 -0.3849872210021 0.264147770277477 0.116531228862955 0.497860782838236 0.161269013922996 0.234080149315319 0.157854034029104 0.263356012964342 -0.00336330650343029 0.402329497936767 0.223826976690257 0.217871352359561 -0.018621152710097 -0.199859483833689 0.301542122475423 -0.10693804703292 0.0255285358340896 0.110186278365041 0.698372451014364 -0.163557852687167 -0.261581259824592 0.00813700171103636 0.111818959287136 -0.0119256120226023 0.0133569175958112 0.116370701164444 0.0656558081105655 -0.293080346136118 0.0408191202889345 -0.25855157084864 -0.00927563877922333 -0.160673115811585 -0.139086949033948 0.0385696575971558 -0.0708599785204425 -0.0921319241539431 -0.152778022212777 0.183499301031016 -0.269738498248587 0.00478052218439239 -0.230197172150159 -0.0246615815921311 -0.0180694962060222 -0.184221305706484 0.0271692885270336 0.943283351102956 0.253218416656115 0.251902817859083 0.76839317945927 0.734745630712451

thomasdes commented 9 years ago

We can see in the iterations that better values are found for f than 10000.

beniz commented 9 years ago

Have you played with sigma, e.g. lowering its initial value ?

thomasdes commented 9 years ago

Yes, same thing happen.

nikohansen commented 9 years ago

It looks like the optimizer has been driven to a place where solutions systematically produce NaN. I would tend to believe that the problem is related to modeling the fitness rather than to the optimizer. Looking at plots generally helps to make better diagnoses. Also showing not only best, but median and worst fitness might help. "best solution" in the last output line is indeed a little misleading. It probably refers to best current solution, i.e. best solution in the current sample population.

beniz commented 9 years ago

@thomasdes you can plot by setting:

cmaparams.set_fplot('plotfile.dat');

and using the script that comes with the lib: https://github.com/beniz/libcmaes/wiki/Visualizing-optimization-results-and-convergence

What happens if you try various values of lambda greater than 16 (e.g. 100) ?

thomasdes commented 9 years ago

Is it possible that the problem comes from error of scale : in R parameters are arround 5 whereas in parameters I want to be arround 0.02 ? I think the scale was too small and then when we were moving arround a singular value of G which is of the form A + G? It seems to better work with a rescaling of 200 for R.

thomasdes commented 9 years ago

What I don't understand is why it returned the nan values whereas it has found smaller values ?

nikohansen commented 9 years ago

The code should return the best ever evaluated solution (among other things), but it typically shows the current best evaluated solution. In case of a noisy function, the best ever evaluated solution might be misleading and not desirable. In either case, a good return candidate is also the distribution mean.

beniz commented 9 years ago

What I don't understand is why it returned the nan values whereas it has found smaller values ?

Very certainly because it is moving into an area where suddenly all samples are yielding 'NaN' values. At least this is my understanding. In other words the optimizer does not operate at the frontier of the viable parameter space of the objective function. What happens if you run with a very small step-size, e.g. 1e-5 to 1e-8 ?

Also, my intuition would be that this can happen so suddenly almost only if a very small number (one ?) parameters is enough to trigger the 'NaN' value.

beniz commented 9 years ago

And yes, you can update the progress function in order to print the current mean (xmean()). There is no accessor yet to the best seen candidate although it is kept within the CMASolution object. This is something I could add very shortly.

At least for now, a quick hack would be to keep the best candidate from a global variable accessed from a custom progress function.

nikohansen commented 9 years ago

Also, my intuition would be that this can happen so suddenly almost only if a very small number (one ?) parameters is enough to trigger the 'NaN' value.

and, more importantly, if several/many variables trigger NaN more or less independently, because then the volume where to see good solutions might become small. If it is only one variable, the volume tends to remain at least 1/2.

beniz commented 9 years ago

Here you go, CMASolutions::get_best_seen_candidate() is now on 'dev' branch, and can be used in a custom progress function.

beniz commented 9 years ago

@nikohansen I was thinking proposing to use initial elitist: it would restart automatically from best seen candidate, and go on until no better candidate can be found.

@thomasdes you can activate this behavior by using:

cmaparams.set_elitist(true);

thomasdes commented 9 years ago

Thank you! I will try it tomorrow!

nikohansen commented 9 years ago

For clarification: with cmaparams.set_elitist(true) the elitist solution is set only once and for all per run?

beniz commented 9 years ago

Yes, as it is the 'initial elitist' scheme, it does reinject x0, which is set to the best known candidate upon restart.

I had been considering an 'online elitist' scheme that would reinject the current best seen candidate within a single run, but I haven't pursued this direction yet.

EDIT: and I don't expect too much from 'initial elitist' on this current application, but doesn't cost much to try.

nikohansen commented 9 years ago

I see, the name of method is slightly misleading then. Another clarification: you seem to suggest that this applies only to a restart, not a single or first run?

beniz commented 9 years ago

Another clarification: you seem to suggest that this applies only to a restart, not a single or first run?

Correct. It kicks in at the end of first run if the best seen candidate is not within a few iterations of the final best current solution candidate.

There's an alternative, called 'elitist x0' (in branch sep_97), that reinjects x0 as needed from first run onward.

beniz commented 9 years ago

the name of method is slightly misleading then

Yes... maybe something like:

'(online) best candidate elitism': reinjects best candidate along the way
'initial elitism': reinject x0
'initial best candidate elitism': start search with best candidate and reinjects it, stops when no more best candidates can be found.

nikohansen commented 9 years ago

elitist: always (re-)inject the best-ever seen solution
initial elitism: (re-)inject X0 as long as it has not been improved
initial elitism on restart: as above, but not for the first run

beniz commented 9 years ago

OK, will provide a ticket about this, and will discuss in there, as I have a few questions/remarks.

beniz commented 9 years ago

@nikohansen please see #103 for the semantics of every strategy to be worked on.

beniz commented 9 years ago

No activity for > 20 days, I'm closing this, the full support for elitism is now ready in #103. The other question posed by this thread was the need (or not) for basic rejection sampling, which can be decided upon by opening a dedicated issue (or not).

CMA-ES / libcmaes

Function with nan value #102