gnu-octave / statistics

The Statistics package for GNU Octave
GNU General Public License v3.0
25 stars 24 forks source link

BISTs in ClassificationPartitionedModel can take forever #160

Closed svillemot closed 2 months ago

svillemot commented 2 months ago

I uploaded statistics 1.7.0 to Debian and I get build failures related to BISTs in ClassificationPartitionedModel that take forever (at least more than 150 minutes). This seems to be a random issue. See for example this build log: https://buildd.debian.org/status/fetch.php?pkg=octave-statistics&arch=amd64&ver=1.7.0-1&stamp=1725455778&raw=0

May this be related to the use of random data in the BISTs?

pr0m1th3as commented 2 months ago

This should not be a problem. The input validation tests in ClassificationPartitionedModel run really fast.

>> tic; fail ("ClassificationPartitionedModel (RegressionGAM (ones (40,2), randi ([1, 2], 40, 1)), cvpartition (randi ([1, 2], 40, 1), 'Holdout', 0.3))", "ClassificationPartitionedModel: unsupported model type.");toc
Elapsed time is 0.0237939 seconds.
>>

The output tests take some significant amount of time, but certainly not that long. On Ubuntu 20.04 with Octave 9.1, all tests pass.

>> t0 = clock (); test ClassificationPartitionedModel; etime (clock (), t0)
PASSES 19 out of 19 tests
ans = 30.369
>>

The only thing wrong I notice with ClassificationPartitionedModel is that I don't check/validate the input for the cvpartition object as the second argument.

pr0m1th3as commented 2 months ago

After last update, the testing time is still relatively small

>> t0 = clock (); test ClassificationPartitionedModel; etime (clock (), t0)
PASSES 20 out of 20 tests
ans = 33.782

I doubt, the issue you are facing on debian is related to the BISTs in ClassificationPartinedModel class. Or at least I cannot find a connection from the build log file you referenced.

svillemot commented 2 months ago

I assumed the problem comes from ClassificationPartitionedModel because the timeout message comes in the middle of the BISTs of that function. Of course the root cause could lie somewhere else, for example in Octave core.

pr0m1th3as commented 2 months ago

I am sorry I can't be more helpful at the time. Perhaps, opening a thread on discourse under the maintainers category might help.

svillemot commented 2 months ago

I confirm that by repeatedly running the following command at the Octave prompt, with Netlib BLAS/LAPACK installed, I can reproduce the problem:

ClassificationPartitionedModel (RegressionGAM (ones (40,2), randi ([1, 2], 40, 1)), cvpartition (randi ([1, 2], 40, 1), 'Holdout', 0.3))

i.e. Octave gets stuck with a very high CPU usage.

svillemot commented 2 months ago

More precisely, repeatedly running the following is enough to reproduce the random issue:

RegressionGAM (ones (40,2), randi ([1, 2], 40, 1))

But the problem only seems to manifest with Netlib BLAS/LAPACK installed, not with OpenBLAS.

svillemot commented 2 months ago

Debugging shows that the fitsGAM method of the RegressionGAM class enters an infinite loop. For some reason, the regression never converges on some random data and with Netlib BLAS/LAPACK.

pr0m1th3as commented 2 months ago

@svillemot Can you check again with the latest commit?

svillemot commented 2 months ago

Thanks. Your fix doesn’t work because the logic is wrong (should be iter < 1000).

pr0m1th3as commented 2 months ago

My bad. Does it work now?

svillemot commented 2 months ago

The latest commit indeed fixes the issue, but I realize that it introduces another problem: there will always be at least 1000 iterations, even if convergence occurs before.

The test should rather be while (!(converged || iter > 1000)) (note the parenthesis). Sorry for suggesting a wrong fix in my previous comment.

pr0m1th3as commented 2 months ago

Now it should be fixed

svillemot commented 2 months ago

Thanks, thus closing.