Closed knokknok closed 6 years ago
This seems to be a separate issue than the other (and a more informative exception). Once I've resolved the other issue, I would probably try to bug you for a sample that causes this --- though I have a reasonable idea about how to fix it. It would be nice to have the fix for both issues in the same hotfix.
To be more specific : this is, as the exception says, a numeric underflow issue when evaluating the digamma function. The solution here is just to bump up the value that is required before evaluating this function. This should be straightforward, but I suspect the issue is also related to this log message:
[2018-05-31 17:08:11.488] [jointLog] [info] Marked 1 weighted equivalence classes as degenerate
but I suspect the issue is also related to this log message
I wondered about that too but other samples gave me 2 and 3 degenerate classes and still passed...
Right --- such classes should be removed (specifically because they can cause such underflow issues). I suspect that I just need to make the bound for evaluation more conservative. I chose the smallest value that worked on my testing machine, but I imagine that when underflow occurs could be slightly different on different machines. I should just find / choose a stricter bound.
Think you could tell me which line to edit? Since I can now work on a linux box this issue is now more pressing for me than the other one...
Yes. There are two places to look. The first is possibly changing this cutoff. The other is putting a guard around this line that only evaluates the digamma
if alphaSum
is greater than the digamma min cutoff. If alphaSum
fails that check (it really should not), then you can likely just skip the rest of that function.
I was able to fix the other bug you encountered (on OSX) --- it should now be fixed in the develop branch with a clean build. That makes this bug my current top priority. I'd like to fix it so I can merge both changes into master and cut a v0.10.1 release. Let me know the best way I can help address this one.
I tried wrapping the code around alphaSum but it didn't work. I also set digammaMin to 1e-9 but no change. What I find weird is that the following code works...
#include <boost/math/special_functions/digamma.hpp>
int main() {
double logNorm = boost::math::digamma(1e-50);
printf("%f\n", logNorm);
return logNorm;
}
I have a thought --- can you try compiling the above with -O3
?
P.S. nevermind --- doesn't seem to make a difference on my end.
It also works with -O3.
Instead of the if (ap > ::digammaMin) {
lines, would it make sense to catch the exception and set expTheta[i] = 0.0;
if thrown?
I believe that would be the appropriate behavior, yes. However, I'm quite curious now what argument to digamma is causing this as I'm able to go to tiny values of the argument without seeing an exception.
If you implement the try/catch and print the value in the catch code I might be able to tell you ;-)
I'll be happy to go ahead an implement this; but the numeric_error exception that is thrown doesn't contain the value that caused the error (just that string that is printed out) Error in function boost::math::digamma(double): numeric overflow
. The try/catch should be easy to do, but I'm quite interested in how this can even be triggered given that the range check is already being done.
Ok, I pushed a commit to develop that wraps this in try/catch (and also tries to print out the relevant value of the argument from the context). Please let me know if (1) this averts the uncaught exception and (2) what argument to digamma is triggering this strange behavior :)!
[2018-05-31 20:01:46.042] [jointLog] [info] Marked 1 weighted equivalence classes as degenerate
[2018-05-31 20:01:46.073] [jointLog] [info] iteration = 0 | max rel diff. = 127.422
Exception : [Error in function boost::math::digamma
:-O
Could it be coming from some other part of the code or some dependency?
Well, in the other branch, expTheta
is directly set to zero so that digamma
is not called. Are you compiling from the develop branch? I'll try and see if it could be coming from anywhere else in the code.
A quick grep shows these as the only uses of digamma
in the code:
../src/CollapsedEMOptimizer.cpp:117: double logNorm = boost::math::digamma(alphaSum);
../src/CollapsedEMOptimizer.cpp:126: std::exp(boost::math::digamma(ap) - logNorm);
../src/CollapsedEMOptimizer.cpp:257: double logNorm = boost::math::digamma(alphaSum);
../src/CollapsedEMOptimizer.cpp:270: std::exp(boost::math::digamma(ap) - logNorm);
### salmon (mapping-based) v0.10.1
the fact that this Exception : [Error in function boost::math::digamma(double): numeric overflow]
is printing out means that it is being caught by the "catch-all" exception handling here. Could you also make sure that this thing isn't triggered if you don't pass --useVBOpt
(where no digamma should be called)?
Also, I assumed (according to the documentation) that these underflow and overflow errors inherit from std::runtime_error
, but I've updated the try/catch with a more generic exception class just in case they are not.
I have added printf
s before every digamma and I get:
alphaSum: inf
So the try/catch is not at the right place ;-)
Brilliant work ... and incredibly strange. Somehow, either one of the inputs in this sum is infinite, or adding them up leads to an infinity!
EQVecT i:82396 M:310732 alphaIn:11098.990511 priorAlphas:0.025489 EQVecT i:83967 M:310732 alphaIn:inf priorAlphas:0.001380 EQVecT i:84187 M:310732 alphaIn:11102.387087 priorAlphas:0.036369
Interesting: so here EQVecT i
is the transcript id, right? This means transcript 83967
already has an infinite alpha value as input? Do we know what exact round of the algorithm this is? We know it's between 0 and 100.
EQVecT means we are in the function with template <typename EQVecT>
i is the index of the loop that generates alphaSum
If I printf "Round" line 257, I get:
[2018-05-31 22:27:34.996] [jointLog] [info] Starting optimizer
Round
[2018-05-31 22:27:35.226] [jointLog] [info] Marked 1 weighted equivalence classes as degenerate
Round
[2018-05-31 22:27:35.257] [jointLog] [info] iteration = 0 | max rel diff. = 127.379
Exception : [Error in function boost::math::digamma
Yes, so in the loop that computes alphaSum, i
is the transcript id. Regarding the round, can you print the value of itNum
between lines 892 and 893? This will say what round of the VBEM we are executing. Basically, I want to figure out if we have an infinite value on the first iteration, or if some previous round of the VBEM resulted in an infinite output value being placed into some transcript --- and if so, what round this happens in.
BTW I don't know if it could be related but make test
gives me:
Running tests...
Test project salmon/build
Start 1: unit_tests
1/3 Test #1: unit_tests .......................***Failed 0.02 sec
Start 2: salmon_read_test_fmd
2/3 Test #2: salmon_read_test_fmd ............. Passed 1.74 sec
Start 3: salmon_read_test_quasi
3/3 Test #3: salmon_read_test_quasi ........... Passed 1.62 sec
67% tests passed, 1 tests failed out of 3
Total Test time (real) = 3.37 sec
The following tests FAILED: 1 - unit_tests (Failed) Errors while running CTest Makefile:151: recipe for target 'test' failed make: *** [test] Error 8
itNum: 0 itNum: 1 [2018-05-31 22:48:54.566] [jointLog] [info] Computed 583973 rich equivalence classes for further processing [2018-05-31 22:48:54.566] [jointLog] [info] Counted 59985214 total reads in the equivalence classes [2018-05-31 22:48:54.609] [jointLog] [info] Mapping rate = 88.2723%
[2018-05-31 22:48:54.609] [jointLog] [info] finished quantifyLibrary()
[2018-05-31 22:48:54.610] [jointLog] [info] Starting optimizer
[2018-05-31 22:48:54.829] [jointLog] [info] Marked 1 weighted equivalence classes as degenerate
[2018-05-31 22:48:54.859] [jointLog] [info] iteration = 0 | max rel diff. = 127.38
Exception : [Error in function boost::math::digamma
probably unrelated, but also unexpected. Make test on my linux and osx box looks like:
$ make test
Running tests...
Test project /Users/rob/salmon/build
Start 1: unit_tests
1/3 Test #1: unit_tests ....................... Passed 0.13 sec
Start 2: salmon_read_test_fmd
2/3 Test #2: salmon_read_test_fmd ............. Passed 0.87 sec
Start 3: salmon_read_test_quasi
3/3 Test #3: salmon_read_test_quasi ........... Passed 0.39 sec
100% tests passed, 0 tests failed out of 3
Total Test time (real) = 1.41 sec
It looks the same on the continuous integration server :
Running tests...
/usr/local/cmake-3.9.2/bin/ctest --force-new-ctest-process
Test project /home/travis/build/COMBINE-lab/salmon/build
Start 1: unit_tests
1/3 Test #1: unit_tests ....................... Passed 0.13 sec
Start 2: salmon_read_test_fmd
2/3 Test #2: salmon_read_test_fmd ............. Passed 2.55 sec
Start 3: salmon_read_test_quasi
3/3 Test #3: salmon_read_test_quasi ........... Passed 1.72 sec
100% tests passed, 0 tests failed out of 3
Total Test time (real) = 4.41 sec
Also, you can look, in the build directory, in the subdirectory Testing/Temporary/LastTestsFailed.log
which will give details of which specific test failed.
Ok, so it looks like we complete iteration 0, so that the initial alphas (before we start the VBEM) is OK. But after one call to VBEMUpdate_
, we have a transcript with an infinite count. So it happens almost immediately.
cat Testing/Temporary/LastTestsFailed.log 1:unit_tests
ha ... that log isn't particularly interesting --- what happens if you run ./src/unitTests
from the build directory?
All tests passed (108 assertions in 4 test cases)
hrmmmmm .... so then the failure is stochastic?
Well... ./src/unitTests
seems to always pass and make test
to always fail...
can you try make install
before make test
?
$ sudo make install [ 7%] Built target libcereal [ 14%] Built target libdivsufsort [ 21%] Built target libstadenio [ 28%] Built target libbwa [ 36%] Built target libgff [ 42%] Built target libspdlog [ 47%] Built target ksw2pp_basic [ 49%] Built target ksw2pp_sse4 [ 52%] Built target ksw2pp_sse2 [ 53%] Built target ksw2pp [ 55%] Built target alevin_core [ 69%] Built target salmon_core [ 74%] Built target unitTests [100%] Built target salmon Install the project... -- Install configuration: "Release" -- Up-to-date: /usr/local/lib -- Up-to-date: /usr/local/lib/libtbbmalloc.so -- Up-to-date: /usr/local/lib/pkgconfig -- Up-to-date: /usr/local/lib/libtbb.so -- Up-to-date: /usr/local/lib/libtbb.so.2 -- Up-to-date: /usr/local/lib/libtbbmalloc_proxy.so.2 -- Up-to-date: /usr/local/lib/libtbbmalloc_proxy.so -- Up-to-date: /usr/local/lib/libtbbmalloc.so.2 -- Up-to-date: /usr/local/bin/salmon -- Up-to-date: /usr/local/lib/libsalmon_core.a
Installation complete. Please ensure the following paths are set properly.
Please add /usr/local/bin to your PATH Please add /usr/local/lib to your LD_LIBRARY_PATH
$ make test Running tests... Test project salmon/build Start 1: unit_tests 1/3 Test #1: unit_tests .......................***Failed 0.02 sec Start 2: salmon_read_test_fmd 2/3 Test #2: salmon_read_test_fmd ............. Passed 1.67 sec Start 3: salmon_read_test_quasi 3/3 Test #3: salmon_read_test_quasi ........... Passed 1.62 sec
67% tests passed, 1 tests failed out of 3
Total Test time (real) = 3.32 sec
The following tests FAILED: 1 - unit_tests (Failed) Errors while running CTest Makefile:151: recipe for target 'test' failed make: *** [test] Error 8
Ok, GMT+2 here, have to go ;-)
It must be a different unitTest
executable that is running, since make test
is just using CMake to run the same executable in ./src/unitTests
. I'm happy to help debug this further, but I'm currently most interested in how a transcript gets an infinite count after round 0.
Ok, GMT+2 here, have to go ;-)
Of course; thank you so much for your help today! Glad we were able to squash the other bug. I'll think more about this vbOpt bug. If I can find a dataset to run that triggers it, I can probably debug it faster (like the other bug).
thank you so much for your help today!
Sure, after all it is quite selfish ;-)
I have been trying to generate smaller transcript db and fastq files but to no avail (can't make it fail again)
However it looks like your test if (denom <= ::minEQClassWeight)
could be not stringent enough:
invDenom:inf count:1 denom:2.77171e-321 minEQClassWeight:4.94066e-324
invDenom:inf count:1 denom:4.69042e-316 minEQClassWeight:4.94066e-324
These two lines result from: groupSize: 2 i:0 tid:83966 aux:0.756044 expTheta[tid]:0 i:1 tid:83967 aux:0.243956 expTheta[tid]:7.23806e-321
groupSize: 2 i:0 tid:190925 aux:0.542131 expTheta[tid]:0 i:1 tid:272773 aux:0.457869 expTheta[tid]:6.2423e-316
I have replaced if (denom <= ::minEQClassWeight)
with if (denom <= 1e-200) {
and my samples seem to pass...
Although I have no idea if such low denom
values are "normal" and/or if this fix can have a negative impact on the resulting quantifications...
Just to get more info about exactly the equivalence class causing this problem, can you try to change this definition of verbose{false} to verbose{true} ? https://github.com/COMBINE-lab/salmon/blob/develop/src/CollapsedEMOptimizer.cpp#L376
Also, out of curiosity. Does it work if you make 1e-200
to be std::numeric_limits<double>::min()
?
Also, out of curiosity. Does it work if you make 1e-200 to be std::numeric_limits
::min()?
I'll tell you later but it might:
int main() {
printf("%g\n", 1/1e-310);
printf("%g\n", std::numeric_limits
inf 2.22507e-308 4.49423e+307
It did pass on one of the problematic samples. Also: Dropping weighted eq class \============================ denom = 0, count = 1 class = { 140152 140153 140154 } alphas = { 0 0 0 } weights = { 0.954631 0.029385 0.0159838 } \============================
Awesome! Thanks for checking it out. I think having denorm_min
become min
makes sense. I'm gonna push both this and #228 to master and cut a 0.10.1 release (and I'll be sure to thank you in the release notes). I'll close these issues once that release is done.
These fixes are now live in v0.10.1 --- thanks again!
Great! Thanks!
(Actually it is knokknok
, the correct spelling was already taken ;-) )
Sorry, me (#228) again... While most of my samples seem to work on linux, I got an exception for one of them:
Version Info: This is the most recent version of Salmon. ### salmon (mapping-based) v0.10.0 ### [ program ] => salmon ### [ command ] => quant ### [ index ] => { salmon010.index.all_combined } ### [ libType ] => { A } ### [ mates1 ] => { R1.fastq.gz } ### [ mates2 ] => { R2.fastq.gz } ### [ posBias ] => { } ### [ gcBias ] => { } ### [ seqBias ] => { } ### [ useVBOpt ] => { } ### [ validateMappings ] => { } ### [ output ] => { processed_salmon0100_k31_allcombined/R } Logs will be written to processed_salmon0100_k31_allcombined/R/logs [2018-05-31 16:54:42.310] [jointLog] [info] Fragment incompatibility prior below threshold. Incompatible fragments will be ignored. [2018-05-31 16:54:42.310] [jointLog] [info] Usage of --validateMappings implies use of range factorization. rangeFactorizationBins is being set to 4 [2018-05-31 16:54:42.310] [jointLog] [info] parsing read library format [2018-05-31 16:54:42.310] [jointLog] [info] There is 1 library. [2018-05-31 16:54:42.480] [jointLog] [info] Loading Quasi index [2018-05-31 16:54:42.501] [jointLog] [info] Loading 32-bit quasi index [2018-05-31 16:54:42.501] [stderrLog] [info] Loading Suffix Array [2018-05-31 16:55:01.293] [stderrLog] [info] Loading Transcript Info [2018-05-31 16:55:06.428] [stderrLog] [info] Loading Rank-Select Bit Array [2018-05-31 16:55:07.107] [stderrLog] [info] There were 310732 set bits in the bit array [2018-05-31 16:55:07.158] [stderrLog] [info] Computing transcript lengths [2018-05-31 16:55:07.159] [stderrLog] [info] Waiting to finish loading hash [2018-05-31 16:55:25.973] [jointLog] [info] done [2018-05-31 16:55:25.973] [jointLog] [info] Index contained 310732 targets [2018-05-31 16:55:25.973] [stderrLog] [info] Done loading index
processed 67500000 fragmentsointLog] [info] Automatically detected most likely library type as IU hits: 224580543, hits per frag: 3.35031[2018-05-31 17:08:09.486] [jointLog] [info] Thread saw mini-batch with a maximum of 1.04% zero probability fragments [2018-05-31 17:08:09.486] [jointLog] [info] Thread saw mini-batch with a maximum of 0.96% zero probability fragments [2018-05-31 17:08:09.486] [jointLog] [info] Thread saw mini-batch with a maximum of 0.84% zero probability fragments [2018-05-31 17:08:09.486] [jointLog] [info] Thread saw mini-batch with a maximum of 0.96% zero probability fragments [2018-05-31 17:08:09.486] [jointLog] [info] Thread saw mini-batch with a maximum of 0.88% zero probability fragments [2018-05-31 17:08:09.486] [jointLog] [info] Thread saw mini-batch with a maximum of 1.00% zero probability fragments [2018-05-31 17:08:09.486] [jointLog] [info] Thread saw mini-batch with a maximum of 0.92% zero probability fragments [2018-05-31 17:08:09.486] [jointLog] [info] Thread saw mini-batch with a maximum of 0.84% zero probability fragments [2018-05-31 17:08:09.486] [jointLog] [info] Thread saw mini-batch with a maximum of 0.84% zero probability fragments [2018-05-31 17:08:09.486] [jointLog] [info] Thread saw mini-batch with a maximum of 0.86% zero probability fragments [2018-05-31 17:08:09.486] [jointLog] [info] Thread saw mini-batch with a maximum of 0.94% zero probability fragments [2018-05-31 17:08:09.486] [jointLog] [info] Thread saw mini-batch with a maximum of 0.96% zero probability fragments [2018-05-31 17:08:09.486] [jointLog] [info] Thread saw mini-batch with a maximum of 0.98% zero probability fragments [2018-05-31 17:08:09.486] [jointLog] [info] Thread saw mini-batch with a maximum of 0.92% zero probability fragments [2018-05-31 17:08:09.486] [jointLog] [info] Thread saw mini-batch with a maximum of 0.94% zero probability fragments [2018-05-31 17:08:09.486] [jointLog] [info] Thread saw mini-batch with a maximum of 0.98% zero probability fragments [2018-05-31 17:08:09.486] [jointLog] [info] Thread saw mini-batch with a maximum of 0.92% zero probability fragments [2018-05-31 17:08:09.486] [jointLog] [info] Thread saw mini-batch with a maximum of 1.04% zero probability fragments [2018-05-31 17:08:09.486] [jointLog] [info] Thread saw mini-batch with a maximum of 1.02% zero probability fragments [2018-05-31 17:08:09.486] [jointLog] [info] Thread saw mini-batch with a maximum of 0.92% zero probability fragments [2018-05-31 17:08:09.486] [jointLog] [info] Thread saw mini-batch with a maximum of 0.92% zero probability fragments [2018-05-31 17:08:09.486] [jointLog] [info] Thread saw mini-batch with a maximum of 0.94% zero probability fragments [2018-05-31 17:08:09.486] [jointLog] [info] Thread saw mini-batch with a maximum of 0.88% zero probability fragments [2018-05-31 17:08:09.486] [jointLog] [info] Thread saw mini-batch with a maximum of 0.90% zero probability fragments [2018-05-31 17:08:09.486] [jointLog] [info] Thread saw mini-batch with a maximum of 0.94% zero probability fragments [2018-05-31 17:08:09.486] [jointLog] [info] Thread saw mini-batch with a maximum of 0.86% zero probability fragments [2018-05-31 17:08:09.486] [jointLog] [info] Thread saw mini-batch with a maximum of 0.86% zero probability fragments [2018-05-31 17:08:09.486] [jointLog] [info] Thread saw mini-batch with a maximum of 0.86% zero probability fragments [2018-05-31 17:08:09.498] [jointLog] [info] Thread saw mini-batch with a maximum of 0.90% zero probability fragments [2018-05-31 17:08:09.534] [jointLog] [info] Thread saw mini-batch with a maximum of 0.90% zero probability fragments [2018-05-31 17:08:09.572] [jointLog] [info] Thread saw mini-batch with a maximum of 1.04% zero probability fragments [2018-05-31 17:08:09.634] [jointLog] [info] Thread saw mini-batch with a maximum of 0.90% zero probability fragments [2018-05-31 17:08:09.641] [jointLog] [info] Thread saw mini-batch with a maximum of 1.00% zero probability fragments [2018-05-31 17:08:09.649] [jointLog] [info] Thread saw mini-batch with a maximum of 0.88% zero probability fragments [2018-05-31 17:08:09.690] [jointLog] [info] Thread saw mini-batch with a maximum of 0.96% zero probability fragments [2018-05-31 17:08:09.702] [jointLog] [info] Thread saw mini-batch with a maximum of 0.92% zero probability fragments [2018-05-31 17:08:09.721] [jointLog] [info] Thread saw mini-batch with a maximum of 0.88% zero probability fragments [2018-05-31 17:08:09.728] [jointLog] [info] Thread saw mini-batch with a maximum of 0.90% zero probability fragments [2018-05-31 17:08:09.782] [jointLog] [info] Thread saw mini-batch with a maximum of 0.86% zero probability fragments [2018-05-31 17:08:09.786] [jointLog] [info] Thread saw mini-batch with a maximum of 0.90% zero probability fragments
[2018-05-31 17:08:10.483] [jointLog] [info] Computed 582392 rich equivalence classes for further processing [2018-05-31 17:08:10.483] [jointLog] [info] Counted 59985194 total reads in the equivalence classes [2018-05-31 17:08:10.551] [jointLog] [info] Mapping rate = 88.2723%
[2018-05-31 17:08:10.551] [jointLog] [info] finished quantifyLibrary() [2018-05-31 17:08:10.551] [jointLog] [info] Starting optimizer [2018-05-31 17:08:11.488] [jointLog] [info] Marked 1 weighted equivalence classes as degenerate [2018-05-31 17:08:11.523] [jointLog] [info] iteration = 0 | max rel diff. = 127.399 Exception : [Error in function boost::math::digamma(double): numeric overflow]
salmon quant was invoked improperly.
For usage information, try salmon quant --help
Exiting.