failing tests - Githubissues

roblanf commented 8 years ago

@brettc, can you take a look at these.

I have a small number of tests failing, all because of very small differences in the expected and observed AIC/BIC scores. There's a full list below. I think we need to do two things here:

Update the expected values to the new observed values (the diffs appear to be caused by changes in likeilhoods from Phyml/RAxML).
Increase the tolerance limit to, say, 1.0. Right now it's 0.1, and this often means we get these false positives with failing tests. 1.0 is small, but big enough to be robust to very small changes in the calculations from phyml/raxml.

R

roblanf commented 8 years ago

Here are the failing tests. Every fail is down to small diffs in AIC/BIC:


tests/full_analysis/test_full.py::test_dna[DNA1] PASSED
tests/full_analysis/test_full.py::test_dna[DNA2] FAILED
tests/full_analysis/test_full.py::test_dna[DNA3] FAILED
tests/full_analysis/test_full.py::test_dna[DNA4] FAILED
tests/full_analysis/test_full.py::test_dna[DNA5] FAILED
tests/full_analysis/test_full.py::test_dna[DNA6] PASSED
tests/full_analysis/test_full.py::test_dna[DNA7] FAILED
tests/full_analysis/test_full.py::test_dna[DNA8] PASSED

brettc commented 8 years ago

Ok -- will check this out (currently writing grants, so won't be straight away).

roblanf commented 8 years ago

Yo @brettc, this is me, bugging you. But you're not allowed to use the server right now (running analyses for the ms...).

brettc commented 8 years ago

I believe this is now fixed in the main branch. Can you test?

brettc commented 8 years ago

The latest version of raxml appears to fix the outstanding testing problems. Do we have a consensus on what raxml version we compile and distribute (I used the avx.mac version)

roblanf commented 8 years ago

I use the SSE3 makefiles. Remember we need pthreads and not pthreads compiled separately.

About to check the tests. Awesome that the new RAxML seems to fix the broken one.

On 24 June 2016 at 21:50, Brett Calcott notifications@github.com wrote:

The latest version of raxml appears to fix the outstanding testing problems. Do we have a consensus on what raxml version we compile and distribute (I used the avx.mac version)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/brettc/partitionfinder/issues/98#issuecomment-228326119, or mute the thread https://github.com/notifications/unsubscribe/AA2pE8Jq9FRrKM03Chdihm6gyAL32d1Wks5qO8RqgaJpZM4Idmr8 .

Rob Lanfear School of Biological Sciences, Macquarie University, Sydney

phone: +61 (0)2 9850 8204

www.robertlanfear.com

brettc commented 8 years ago

I vaguely remember discussing this. Did we document it anywhere?

On Sat, Jun 25, 2016 at 1:16 PM roblanf notifications@github.com wrote:

I use the SSE3 makefiles. Remember we need pthreads and not pthreads compiled separately.

About to check the tests. Awesome that the new RAxML seems to fix the broken one.

On 24 June 2016 at 21:50, Brett Calcott notifications@github.com wrote:

The latest version of raxml appears to fix the outstanding testing problems. Do we have a consensus on what raxml version we compile and distribute (I used the avx.mac version)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/brettc/partitionfinder/issues/98#issuecomment-228326119 , or mute the thread < https://github.com/notifications/unsubscribe/AA2pE8Jq9FRrKM03Chdihm6gyAL32d1Wks5qO8RqgaJpZM4Idmr8

.

Rob Lanfear School of Biological Sciences, Macquarie University, Sydney

phone: +61 (0)2 9850 8204

www.robertlanfear.com

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/brettc/partitionfinder/issues/98#issuecomment-228498888, or mute the thread https://github.com/notifications/unsubscribe/AAb6Q9JGltd0_qUOQGb-ahMtANE1wqFBks5qPIF8gaJpZM4Idmr8 .

roblanf commented 8 years ago

didn't you have a build script somewhere? If not, we could/should do that. However, my take is really that we can provide best guesses, and folks can compile their own versions if ours don't work.

On 25 June 2016 at 12:02, Brett Calcott notifications@github.com wrote:

I vaguely remember discussing this. Did we document it anywhere?

On Sat, Jun 25, 2016 at 1:16 PM roblanf notifications@github.com wrote:

I use the SSE3 makefiles. Remember we need pthreads and not pthreads compiled separately.

About to check the tests. Awesome that the new RAxML seems to fix the broken one.

On 24 June 2016 at 21:50, Brett Calcott notifications@github.com wrote:

The latest version of raxml appears to fix the outstanding testing problems. Do we have a consensus on what raxml version we compile and distribute (I used the avx.mac version)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <

https://github.com/brettc/partitionfinder/issues/98#issuecomment-228326119

, or mute the thread <

https://github.com/notifications/unsubscribe/AA2pE8Jq9FRrKM03Chdihm6gyAL32d1Wks5qO8RqgaJpZM4Idmr8

.

Rob Lanfear School of Biological Sciences, Macquarie University, Sydney

phone: +61 (0)2 9850 8204

www.robertlanfear.com

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub < https://github.com/brettc/partitionfinder/issues/98#issuecomment-228498888 , or mute the thread < https://github.com/notifications/unsubscribe/AAb6Q9JGltd0_qUOQGb-ahMtANE1wqFBks5qPIF8gaJpZM4Idmr8

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/brettc/partitionfinder/issues/98#issuecomment-228501193, or mute the thread https://github.com/notifications/unsubscribe/AA2pE4-Lg6LQKYDguU2t9eptz1w-Tb1sks5qPIwbgaJpZM4Idmr8 .

Rob Lanfear School of Biological Sciences, Macquarie University, Sydney

phone: +61 (0)2 9850 8204

www.robertlanfear.com

brettc commented 8 years ago

There is a build script, but it only builds one binary. Let's chat about this and nail down some defaults.

On Sat, Jun 25, 2016 at 2:31 PM roblanf notifications@github.com wrote:

didn't you have a build script somewhere? If not, we could/should do that. However, my take is really that we can provide best guesses, and folks can compile their own versions if ours don't work.

On 25 June 2016 at 12:02, Brett Calcott notifications@github.com wrote:

I vaguely remember discussing this. Did we document it anywhere?

On Sat, Jun 25, 2016 at 1:16 PM roblanf notifications@github.com wrote:

I use the SSE3 makefiles. Remember we need pthreads and not pthreads compiled separately.

About to check the tests. Awesome that the new RAxML seems to fix the broken one.

On 24 June 2016 at 21:50, Brett Calcott notifications@github.com wrote:

The latest version of raxml appears to fix the outstanding testing problems. Do we have a consensus on what raxml version we compile and distribute (I used the avx.mac version)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <

https://github.com/brettc/partitionfinder/issues/98#issuecomment-228326119

, or mute the thread <

https://github.com/notifications/unsubscribe/AA2pE8Jq9FRrKM03Chdihm6gyAL32d1Wks5qO8RqgaJpZM4Idmr8

.

Rob Lanfear School of Biological Sciences, Macquarie University, Sydney

phone: +61 (0)2 9850 8204

www.robertlanfear.com

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub <

https://github.com/brettc/partitionfinder/issues/98#issuecomment-228498888

, or mute the thread <

https://github.com/notifications/unsubscribe/AAb6Q9JGltd0_qUOQGb-ahMtANE1wqFBks5qPIF8gaJpZM4Idmr8

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/brettc/partitionfinder/issues/98#issuecomment-228501193 , or mute the thread < https://github.com/notifications/unsubscribe/AA2pE4-Lg6LQKYDguU2t9eptz1w-Tb1sks5qPIwbgaJpZM4Idmr8

.

Rob Lanfear School of Biological Sciences, Macquarie University, Sydney

phone: +61 (0)2 9850 8204

www.robertlanfear.com

—

You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/brettc/partitionfinder/issues/98#issuecomment-228502900, or mute the thread https://github.com/notifications/unsubscribe/AAb6Q_rwEgzKiMQuu8KnlPueL5cJ8aTUks5qPJMNgaJpZM4Idmr8 .

roblanf commented 8 years ago

I am still getting the same few tests failing. All because of v. minor diffs in AIC/BIC scores.

I decided to increase the MAX_ERROR to 1.0, since all the diffs I get are really very small, and these tests keep taking our time unnecessarily. I'll see how it goes.

Basically, we want to know that the results are close enough. The old threshold (0.1) is small enough that differences in machine, OS, and random number seed can be enough to make the test fail. That is not the intention. The intention is to catch big changes.

roblanf commented 8 years ago

Fixed. In the end I changed the error threshold to 10 units. This is not a big deal. On the scale of measurement, 10 units is still very sensitive to changes in the underlying partitioning scheme and/or models. This should decrease our false positive rate a lot, without (I hope!) giving us too many false negatives.

roblanf commented 8 years ago

ALL TESTS PASSING: https://github.com/brettc/partitionfinder/commit/7d4c115dd5327bb6b2c3299a153d10596eba9ef3

brettc / partitionfinder

failing tests #98