Closed amueller closed 11 years ago
Hi Andy,
That is weird. The posteriors for a variable (i.e. "MultiVariable" in the code) should sum to one (the sum ranging over the values of that variable). How is the factor graph you're defining? If you send me your code or data, I can try to reproduce the problem.
Best,
André
2013/4/30 Andreas Mueller notifications@github.com
Hi Andre. I noticed some behavior of the inference results that I didn't expect. Sometimes, all posteriors for a given variable are 1 or all are zero. I think this happens only when the solution is integral, but I'm not sure. I would have expected the posteriors to always sum to one. What is the meaning of these posteriors? Thanks, Andy
— Reply to this email directly or view it on GitHubhttps://github.com/andre-martins/AD3/issues/3 .
Thanks for your lightning-fast response.
I was using my Python wrapper with a very simple interface, but maybe there is a bug there.
I just use create_multi_variable
and create_dense_factor
.
I'll try to produce a minimalistic example.
I think I have found the problem in my code. The variables did not belong to any factors. I have a pairwise model where some variables might not be connected to the rest. Is there a simple way to do that? Do I have to create a dense factor for each of this variables?
I guess I should be calling FixMultiVariablesWithoutFactors
, right?
One more question: In the branch and bound solver, I get branching with value = 1
.
It seems to me that if that is the most fractional value, the solution is integer. Still, it keeps branching and returns unsolved
. Any idea why?
Hi Andreas,
Yes, FixMultiVariablesWithoutFactors should solve the problem. An alternative is to connect each of the BinaryVariables associated with the value of a MultiVariable to a XOR factor (which is precisely what FixMultiVariablesWithoutFactors does).
I'll come back soon to your branch and bound question...
Andre'
2013/4/30 Andreas Mueller notifications@github.com
I guess I should be calling FixMultiVariablesWithoutFactors, right?
— Reply to this email directly or view it on GitHubhttps://github.com/andre-martins/AD3/issues/3#issuecomment-17229110 .
Thanks for your help. Good to know I found the right fix! I must have overlooked that before.
That shouldn't happen... Is it still happening after running FixMultiVariablesWithoutFactors?
2013/4/30 Andreas Mueller notifications@github.com
One more question: In the branch and bound solver, I get branching with value = 1. It seems to me that if that is the most fractional value, the solution is integer. Still, it keeps branching and returns unsolved. Any idea why?
— Reply to this email directly or view it on GitHubhttps://github.com/andre-martins/AD3/issues/3#issuecomment-17235873 .
Yes. Maybe I did something else wrong, too. I'll see if I can generate an example / investigate.
Thanks. Please keep me posted about this.
The branch and bound function was not much tested in this release, specially with MultiVariables, so there might eventually be a bug somewhere. I've mostly used branch and bound with binary graphs with hard constraints. This part of the code is also not optimized (e.g. we could branch differently for multi-variables, and we could cache some of the computations in subsequent calls to the LP-MAP algorithm, but that's not yet implemented...)
Andre'
I saw that the implementation was quite simple, but it works very well for me :) I have to submit a paper today but I'm sure to keep you updated if I find out anything.
Thanks. Good luck with your submission!
Andre'
2013/5/1 Andreas Mueller notifications@github.com
I saw that the implementation was quite simple, but it works very well for me :) I have to submit a paper today but I'm sure to keep you updated if I find out anything.
— Reply to this email directly or view it on GitHubhttps://github.com/andre-martins/AD3/issues/3#issuecomment-17270528 .
So I finally did find some time to investigate. It is a bit hard to produce the example as it happens somewhere in learning. What I get is something like this:
Iteration = 127 Dual obj = 1208.79 Primal rel obj = 1208.79 Primal obj = -1e+100 Dual residual = 1.1372e-08 Primal residual = 7.84201e-16 Best dual obj = 1208.79 Best primal rel obj = 1208.79 Best primal obj = -1e+100 Cache
Solution value after 127 iterations (AD3) = 1208.79
Took 0.008 sec.
Solution is fractional.
Branching on variable 16 at depth 5 (value = 1) (most fractional = 0.25)
To me, this looks like a solution is marked as fractional even though it is not. Could this be the cause of the problem? Thanks, Andy
Btw, I also get
python: GenericFactor.cpp:595: virtual void AD3::GenericFactor::SolveQP(const std::vector<double, std::allocator<double> >&, const std::vector<double, std::allocator<double> >&, std::vector<double, std::allocator<double> >*, std::vector<double, std::allocator<double> >*): Assertion `distribution_[i] > -1e-16' failed.
[5] 25289 abort (core dumped) nosetests -sv tests/test_latent_svm.py -m switch_to_ad3
Do you know what might cause this?
I'm looking at the code https://github.com/andre-martins/AD3/blob/master/ad3/FactorGraph.cpp#L741. And you're right there's a bug there!
Instead of
double most_fractional_value = 1.0;
it should be
double most_fractional_value = 0.25;
since 0.25 = (1-0.5)*(1-0.5).
However, I'm not sure this will solve the problem, since integer solutions should have been spotted before in the call to RunAD3 which should have returned status = STATUS_OPTIMAL_INTEGER.
It's possible that the problem has to do with numeric precision issues. In that case a quick fix would be in line 1164,
bool fractional = false; value = 0.0; for (int i = 0; i < variables.size(); ++i) { if (!NEARLY_BINARY((_posteriors)[i], 1e-12)) fractional = true; value += variables[i]->GetLogPotential() * (_posteriors)[i]; } for (int i = 0; i < additional_log_potentials.size(); ++i) { _value += additional_log_potentials[i] * (_additional_posteriors)[i]; }
replacing if
(!NEARLY_BINARY((*posteriors)[i], 1e-12)) fractional = true
by a smaller threshold, say
(!NEARLY_BINARY((*posteriors)[i], 1e-6)) fractional = true
Not very elegant, but will probably solve the problem...
Any chance you have a AD3 file with the factor graph so that I can try to reproduce the problem here?
André
2013/6/28 Andreas Mueller notifications@github.com
So I finally did find some time to investigate. It is a bit hard to produce the example as it happens somewhere in learning. What I get is something like this:
Iteration = 127 Dual obj = 1208.79 Primal rel obj = 1208.79 Primal obj = -1e+100 Dual residual = 1.1372e-08 Primal residual = 7.84201e-16 Best dual obj = 1208.79 Best primal rel obj = 1208.79 Best primal obj = -1e+100 Cache Solution value after 127 iterations (AD3) = 1208.79 Took 0.008 sec. Solution is fractional. Branching on variable 16 at depth 5 (value = 1) (most fractional = 0.25)
To me, this looks like a solution is marked as fractional even though it is not. Could this be the cause of the problem? Thanks, Andy
— Reply to this email directly or view it on GitHubhttps://github.com/andre-martins/AD3/issues/3#issuecomment-20184441 .
Thanks a lot for your suggestion. I will try the precision change. I created the factor graph with the python interface and I don't have the factor parameters :-/ I can't see from the python side when that happens. I could maybe just output the model in the C++ file... I'll look into that.
Even setting the threshold to 1e-2
didn't make a difference. That seems quite odd.
There is no way to dump a factor graph from the code, is there? I guess I have to hack some signal through the wrappers so I know which model it is (it happens during learning).
That's also odd. I suspect it may be some numeric underflow issue. You may try to comment out the assertion or perhaps to cout << distribution_[i] before aborting to see what's going on... Currently, my code is not very robust when the log-potentials are all tiny numbers (that's something I need to fix in the future).
Andre'
2013/6/28 Andreas Mueller notifications@github.com
Btw, I also get
python: GenericFactor.cpp:595: virtual void AD3::GenericFactor::SolveQP(const std::vector<double, std::allocator
>&, const std::vector<double, std::allocator >&, std::vector<double, std::allocator >, std::vector<double, std::allocator > ): Assertion `distribution_[i] > -1e-16' failed. [5] 25289 abort (core dumped) nosetests -sv tests/test_latent_svm.py -m switch_to_ad3Do you know what might cause this?
— Reply to this email directly or view it on GitHubhttps://github.com/andre-martins/AD3/issues/3#issuecomment-20184640 .
There's a FactorGraph::Print(...) function in https://github.com/andre-martins/AD3/blob/master/ad3/FactorGraph.h#L320, which dumps a factor graph to a AD3 format file. Maybe you could wrap that function in python. That function assumes every factor type has its Print(...) implementation. However, currently, that method is not implemented in some generic factors (another thing to be done in the future). What kind of factors are you using?
Andre'
2013/6/28 Andreas Mueller notifications@github.com
Even setting the threshold to 1e-2 didn't make a difference. That seems quite odd. There is no way to dump a factor graph from the code, is there? I guess I have to hack some signal through the wrappers so I know which model it is (it happens during learning).
— Reply to this email directly or view it on GitHubhttps://github.com/andre-martins/AD3/issues/3#issuecomment-20194348 .
My factor graph is in this gist. Somehow I overlooked the Print
function before.
I get Unknown factor type: 8
when trying to run the command line tool on the file, though :-/
I'm looking at the gist. The first factor you're plotting (in line 259) is not showing up its type. It's probably some factor which derives from GenericFactor (maybe a FactorDense) which doesn't have the Print(...) method implemented? If you tell me what type of factor is its, I can implement its Print(...) method and make a push to the repository.
However, there's something else... From the second line of the AD3 file, there should be 176 factors in the graph, and all I can see is 63 XOR factors plus the abovementioned factor in line 259. Any idea what are the missing 112 factors and why they aren't showing up in the file?
Andre'
2013/6/29 Andreas Mueller notifications@github.com
I get Unknown factor type: 8 when trying to run the command line tool on the file, though :-/
— Reply to this email directly or view it on GitHubhttps://github.com/andre-martins/AD3/issues/3#issuecomment-20232276 .
Never mind, I just noticed everything is collapsed in line 259. I'll implement the Print method shortly. I guess those are DENSE factors, right?
2013/6/29 Andre Martins andre.t.martins@gmail.com
I'm looking at the gist. The first factor you're plotting (in line 259) is not showing up its type. It's probably some factor which derives from GenericFactor (maybe a FactorDense) which doesn't have the Print(...) method implemented? If you tell me what type of factor is its, I can implement its Print(...) method and make a push to the repository.
However, there's something else... From the second line of the AD3 file, there should be 176 factors in the graph, and all I can see is 63 XOR factors plus the abovementioned factor in line 259. Any idea what are the missing 112 factors and why they aren't showing up in the file?
Andre'
2013/6/29 Andreas Mueller notifications@github.com
I get Unknown factor type: 8 when trying to run the command line tool on the file, though :-/
— Reply to this email directly or view it on GitHubhttps://github.com/andre-martins/AD3/issues/3#issuecomment-20232276 .
Yes they are, thank you soo much!
Done.
https://github.com/andre-martins/AD3/commit/2ced06d62e670193b9d3c303a610a792881ce417
Can you dump the factor graph again?
Andre'
2013/6/29 Andreas Mueller notifications@github.com
Yes they are, thank you soo much!
— Reply to this email directly or view it on GitHubhttps://github.com/andre-martins/AD3/issues/3#issuecomment-20233299 .
Thanks. I updated the gist. When I do the branch-and-bound from the command line with the generated file, everything works nicely. That is a bit weird. I did
./ad3_multi --file_graphs=graph.ad3 --file_posteriors=out.blub --max_iterations=1000 --eta=.9 --exact=true
I guess that means I did something wrong in the wrappers? That the serialization changes the result seems a surprising. I'll try to investigate more.
I guess the problem is that I serialize in RunBranchAndBound
where the graph is already modified...
Tada: ../AD3/ad3_multi --file_graphs=graph_1026413173.ad3 --file_posteriors=out.blub --max_iterations=4000 --eta=.1 --exact=true | grep "value = 1"
with this gist.
Thanks for updating the gist. I just corrected a bug in the branch and bound function (every time a variable was being branched, it was marked as a branching variable, and was not being unmarked at the end - this was causing the problem you spotted). I just rerun AD3 in your factor graph and it works now:
Solution is integer. Solution value for AD3 ILP: 478.684 Elapsed time: 0.196 sec.
Thanks a lot for reporting this! And let me know if you spot something else.
André
2013/7/1 Andreas Mueller notifications@github.com
Tada: ../AD3/ad3_multi --file_graphs=graph_1026413173.ad3 --file_posteriors=out.blub --max_iterations=4000 --eta=.1 --exact=true | grep "value = 1" with this gist https://gist.github.com/amueller/5891630 .
— Reply to this email directly or view it on GitHubhttps://github.com/andre-martins/AD3/issues/3#issuecomment-20269366 .
Thanks a lot for the quick fix :) Awesome!
Hi Andre. I noticed some behavior of the inference results that I didn't expect. Sometimes, all posteriors for a given variable are 1 or all are zero. I think this happens only when the solution is integral, but I'm not sure. I would have expected the posteriors to always sum to one. What is the meaning of these posteriors? Thanks, Andy