gnu-octave / statistics

The Statistics package for GNU Octave
GNU General Public License v3.0
24 stars 23 forks source link

Test errors with Octave 10 #150

Closed mmuetzel closed 1 month ago

mmuetzel commented 3 months ago

Running pkg test statistics after installing version 1.6.7 in Octave 10 (hg id 9dd2a8e7e0d5) yields the following summary:

Summary:

  PASS                            10452
  FAIL                               43
  XFAIL (reported bug)                2
  XFAIL (expected failure)            1

Afaict, most of the failing tests involve calling fminsearch in an incompatible way.

fntests.log

pr0m1th3as commented 3 months ago

Did we change something in fminsearch? The failing functions haven't changed since quite a while now.

mmuetzel commented 3 months ago

That might be because of this change: https://hg.savannah.gnu.org/hgweb/octave/file/9dd2a8e7e0d5/etc/NEWS.10.md#l133

  • fminsearch parameter passing: A legacy, undocumented, and only partially supported syntax for passing parameters to the minimized function fcn called by fminsearch by appending them to the input argument list was partly implemented since Octave 4.4.0. Due to conflicts with other input methods, the documentation of this syntax was removed in Octave 5.1.0. The remaining functionality has been completely removed in Octave 10. Attempts to call fminsearch with that syntax will result in an error. The preferred method of passing parameters to any of the minimization functions (including fminsearch, fminbnd, and fminunc) is through the use of anonymous functions. Specific examples of this can be found in the "Minimizers" section of the Octave manual.

Maybe, statistics is using an undocumented feature for which support was finally removed.

pr0m1th3as commented 3 months ago

Great (not great) :smiling_face_with_tear: Thanks for the heads up on this. I assume I will have to build 10 from sources to figure out what is causing the trouble. I hope we are not in a hurry for this.

mmuetzel commented 3 months ago

I assume I will have to build 10 from sources to figure out what is causing the trouble.

Since fminsearch is implemented as a .m file, it might be enough to just replace that file with the newer version: https://hg.savannah.gnu.org/hgweb/octave/raw-file/9dd2a8e7e0d5/scripts/optimization/fminsearch.m

I hope we are not in a hurry for this.

A new major version of Octave is usually released (more or less) early each year since a few years. So, there's probably no need to hurry.

pr0m1th3as commented 1 month ago

I committed a change 335421f156a313d0235603049e38168ece772695 that addresses the changes in fminsearch usage in gpfit. This seems to resolve the issue with the failing tests of the functions related to the Generalized Pareto distributions. However, other failing tests in the fntests.log file you provided have nothing to do with fminsearch. For example, in wblcdf function, it seems that there is some regression related to exp function, which is also used in norminv and other functions that produced failed tests. @mmuetzel Can you test the latest changes in Octave 10 and provide a new fntests.log file?

pr0m1th3as commented 1 month ago

Running pkg test statistics on Octave 9.1 with the latest fminsearch function produces

Summary:

  PASS                            10658
  FAIL                               32
  XFAIL (reported bug)                2
  XFAIL (expected failure)            1

All 32 failed tests relate to the function gpfit using the old syntax in fminsearch (1 test fails in gpfit, 27 tests fail in GeneralizedParetoDistribution, 4 tests fails in fitdist. See attached fntests.log file). Installing statistics from the latest sources (with the changes committed in https://github.com/gnu-octave/statistics/commit/335421f156a313d0235603049e38168ece772695) produces no failed tests.

The remaining failed tests in @mmuetzel fntests.log file are all due to tolerances, which by itself is not a big deal since we are talking machine precision tolerances (usually <1e-16). However, looking a bit deeper into the expcdf function, the failing test due to tolerance

ASSERT errors for:  assert (expcdf (x, 2 * ones (1, 5)),p)

  Location  |  Observed  |  Expected  |  Reason
    (4)        0.39347      0.39347      Abs err 5.5511e-17 exceeds tol 0 by 6e-17

relies in the same mathematical expression used in the function to calculate the expected p values. Thus, the results should be identical. Indeed, in Octave 9.1

>> x = [-1 0 0.5 1 Inf]
x =

  -1.0000        0   0.5000   1.0000      Inf

>> mu = 2
mu = 2
>> z = x ./ mu
z =

  -0.5000        0   0.2500   0.5000      Inf

>> z(z < 0) = 0
z =

        0        0   0.2500   0.5000      Inf

>> p = -expm1 (-z)
p =

        0        0   0.2212   0.3935   1.0000

>> [0, 1 - exp(-x(2:end)/2)]
ans =

        0        0   0.2212   0.3935   1.0000

>> isequal(p, [0, 1 - exp(-x(2:end)/2)])
ans = 1
>>

Can you test the code above in Octave 10?

pr0m1th3as commented 1 month ago

The following tests appear to be affected in Octave 10 on Windows according to @mmuetzel fntests.log file

EDIT: On Windows 10 with Octave 8.4 and Octave 9.2, the aforementioned tests DO NOT fail.

mmuetzel commented 1 month ago

Thank you for fixing the issue with the undocumented syntax of fminsearch.

I no longer recall. But the log from the original post might have been done with a build from the default branch of MXE Octave. That branch is on a newer version of OpenBLAS than the release branch. That might explain slightly different results due to numeric reasons.

I'll try to reproduce with a recent head of the statistics repository. Maybe, at some time later this week.

pr0m1th3as commented 1 month ago

@mmuetzel Can you run a test once more with the latest sources from statistics to see if the problem persists? I intend to make a new release by the end of August and I would like to close this issue. Thanks

dasergatskov commented 1 month ago

WRT openblas. I see in 0.3.28 release notes:

reverted thread management under Windows to its state before 0.3.26
due to signs of race conditions in some circumstances now under study
mmuetzel commented 1 month ago

I rebuilt Octave for Windows from current heads of the Octave and MXE Octave repositories. I then verified that the original statistics version still fails with the same amount of errors as before. (It does.)

After that, I downloaded a .zip archive with the current head of the main branch of the statistics package and installed it in Octave with pkg install statistics-main.zip. When that was done, I re-ran pkg test statistics. It looks like that got stuck indefinitely at the tests for ClassificationPartitionedModel.m:

>> pkg test statistics
Testing functions in package 'statistics':

Integrated test scripts:

                                                                                  [ CPU    /  CLOCK ]
  ..\api-v59\packages\statistics-1.6.8\@cvpartition\cvpartition.m  pass    8/8    [ 0.016s /  0.053s]
  ..tave\api-v59\packages\statistics-1.6.8\@cvpartition\display.m  pass    6/6    [ 0.031s /  0.023s]
  ..g\octave\api-v59\packages\statistics-1.6.8\@cvpartition\get.m  pass    9/9    [ 0.000s /  0.024s]
  ..\api-v59\packages\statistics-1.6.8\@cvpartition\repartition.m  pass    2/2    [ 0.062s /  0.088s]
  ..g\octave\api-v59\packages\statistics-1.6.8\@cvpartition\set.m  pass    5/5    [ 0.000s /  0.015s]
  ..\octave\api-v59\packages\statistics-1.6.8\@cvpartition\test.m  pass    9/9    [ 0.000s /  0.019s]
  ..ave\api-v59\packages\statistics-1.6.8\@cvpartition\training.m  pass    9/9    [ 0.250s /  0.339s]
  ..\statistics-1.6.8\Classification\ClassificationDiscriminant.m  pass   61/61   [ 6.141s /  8.201s]
  ..\packages\statistics-1.6.8\Classification\ClassificationGAM.m  pass   35/35   [ 4.484s / 26.617s]
  ..\packages\statistics-1.6.8\Classification\ClassificationKNN.m  pass   60/158  [ 4.016s /  7.534s]
                                                                   FAIL   98
  ..statistics-1.6.8\Classification\ClassificationNeuralNetwork.m  pass   59/59   [ 5.609s / 25.491s]
  ..tistics-1.6.8\Classification\ClassificationPartitionedModel.m

The last lines in the fntests.log at that point are:

>>>>> processing C:\Users\Markus\AppData\Roaming\octave\api-v59\packages\statistics-1.6.8\Classification\ClassificationPartitionedModel.m
***** test
 x = [1, 2, 3; 4, 5, 6; 7, 8, 9; 3, 2, 1];
 y = ["a"; "a"; "b"; "b"];
 a = fitcknn (x, y);
 partition = cvpartition (y, "KFold", 5);
 cvModel = ClassificationPartitionedModel (a, partition);
 assert (class (cvModel), "ClassificationPartitionedModel");
 assert (cvModel.NumObservations, 4);
 assert (cvModel.ModelParameters.NumNeighbors, 1);
 assert (cvModel.ModelParameters.NSMethod, "kdtree");
 assert (cvModel.ModelParameters.Distance, "euclidean");
 assert (! cvModel.ModelParameters.Standardize);
!!!!! test failed
subsref: unknown method or property: ScoreTransform
***** test
 x = [1, 2, 3; 4, 5, 6; 7, 8, 9; 3, 2, 1];
 y = ["a"; "a"; "b"; "b"];
 a = fitcknn (x, y, "NSMethod", "exhaustive");
 partition = cvpartition (y, "HoldOut", 0.2);
 cvModel = ClassificationPartitionedModel (a, partition);
 assert (class (cvModel), "ClassificationPartitionedModel");
 assert ({cvModel.X, cvModel.Y}, {x, y});
 assert (cvModel.NumObservations, 4);
 assert (cvModel.ModelParameters.NumNeighbors, 1);
 assert (cvModel.ModelParameters.NSMethod, "exhaustive");
 assert (cvModel.ModelParameters.Distance, "euclidean");
 assert (! cvModel.ModelParameters.Standardize);
!!!!! test failed
subsref: unknown method or property: ScoreTransform
***** test
 x = [1, 2, 3; 4, 5, 6; 7, 8, 9; 3, 2, 1];
 y = ["a"; "a"; "b"; "b"];
 k = 3;
 a = fitcknn (x, y, "NumNeighbors" ,k);
 partition = cvpartition (y, "LeaveOut");
 cvModel = ClassificationPartitionedModel (a, partition);
 assert (class (cvModel), "ClassificationPartitionedModel");
 assert ({cvModel.X, cvModel.Y}, {x, y});
 assert (cvModel.NumObservations, 4);
 assert (cvModel.ModelParameters.NumNeighbors, k);
 assert (cvModel.ModelParameters.NSMethod, "kdtree");
 assert (cvModel.ModelParameters.Distance, "euclidean");
 assert (! cvModel.ModelParameters.Standardize);
!!!!! test failed
subsref: unknown method or property: ScoreTransform

Octave shouldn't get stuck in any case. (But Octave's classdef implementation is still in a pretty rough shape. So, I wouldn't be surprised if that happens.) But there still seems to be something wrong with that class. (And maybe that's causing the unexpected behavior of Octave?)

mmuetzel commented 1 month ago

I prior test failed with errors like the following:

>>>>> processing C:\Users\Markus\AppData\Roaming\octave\api-v59\packages\statistics-1.6.8\Classification\ClassificationKNN.m
***** error<ClassificationKNN: 'Standardize' must be either true or false.> ...
 ClassificationKNN (ones (5,3), ones (5,1), "standardize", "a")
!!!!! error failed.
Expected <ClassificationKNN: 'Standardize' must be either true or false.>, but got <ClassificationKNN: Standardize must be either true or false.>

I don't know why the single quotes got missing. Might be an upstream issue in Octave or one of the libraries...

dasergatskov commented 1 month ago

This is bizzare. The test passes on linux on the variety configurations (Fedora, Centos Stream 9, Ubuntu; x_86, aarch; gcc 11.5, gcc 14.2, clang 18.1.6). Octave hgid eb14f7fa391c.

p.s. Since you appear to have two different statistics packages installed -- could it be some mix up between them?

mmuetzel commented 1 month ago

Good point. I did what I described before in the same Octave session. I.e., I didn't close Octave after having run the test suite of the older version of the statistics package. Maybe, that left something loaded that couldn't be properly unloaded. (Maybe, mlock somewhere?)

I re-ran the test suite of the latest head again after restarting Octave with the following results:

                                                        total time (CPU / CLOCK)  [ 136.1s /  305.8s]
Failure Summary:

  ..ng\octave\api-v59\packages\statistics-1.6.8\dist_fun\bvncdf.m  pass    4/5
                                                                   FAIL    1
  ..ng\octave\api-v59\packages\statistics-1.6.8\dist_fun\expcdf.m  pass   17/21
                                                                   FAIL    4
  ..ng\octave\api-v59\packages\statistics-1.6.8\dist_fun\mvncdf.m  pass   11/12
                                                                   FAIL    1
  ..ng\octave\api-v59\packages\statistics-1.6.8\dist_fun\wblcdf.m  pass   24/28
                                                                   FAIL    4
  ..i-v59\packages\statistics-1.6.8\dist_obj\NormalDistribution.m  pass   94/95
                                                                   FAIL    1
  ..aming\octave\api-v59\packages\statistics-1.6.8\shadow9\mean.m  pass   79/80
                                                    (reported bug) XFAIL   1
  ..oaming\octave\api-v59\packages\statistics-1.6.8\fillmissing.m  pass  379/380
                                                    (reported bug) XFAIL   1
  ..ppData\Roaming\octave\api-v59\packages\statistics-1.6.8\pca.m  pass   31/32
                                                (expected failure) XFAIL   1

Summary:

  PASS                            11036
  FAIL                               11
  XFAIL (reported bug)                2
  XFAIL (expected failure)            1

The test log of that run: fntests.log

pr0m1th3as commented 1 month ago

So it is an openblas issue related to windows. Should I just add some tolerance (i.e. 3e-16) to these tests to overcome this issue or should I leave it as it is with the hope that they will fix openblas (upstream) before Octave 10 release? The only thing that worries me is that circumventing such issues by increasing the test tolerance might result in missing such regressions in the future.

dasergatskov commented 1 month ago

Hmm. I do see a couple errors on ubuntu 24.04/aarch64 (Ubuntu 24.04/x86-64 is fine): https://octave.discourse.group/t/possible-regression-in-octave-10/5817/79 But nothing like that.

dasergatskov commented 1 month ago

Do those tests pass with netlib blas?

pr0m1th3as commented 1 month ago

Do those tests pass with netlib blas?

How can I test that?

dasergatskov commented 1 month ago

I do not think we have octave-dev for Windows in distribution, so the question was actually for for Markus @mmuetzel

mmuetzel commented 1 month ago

The tests were run with the reference BLAS implementation.

dasergatskov commented 1 month ago

Are all those due to expcdf() precision issue?

pr0m1th3as commented 1 month ago

AFAICT, the source of the observed discrepancy must be the expm1 and erfc functions and their complements exp and erfcinv`. These are involved in the calculations for the failing tests.

Honestly, I think we spent too much time into this issue for negligible differences. The maximum error we get is less than 3e-16. Unless someone objects, I will just add the required tolerances and commit the changes for further testing.

dasergatskov commented 1 month ago

I do not object, but I am curious why this is happening. The code has not changed for a long time. Is gcc a newer one for Octave 10? It just feels it slips in x87 code or something...

pr0m1th3as commented 1 month ago

I am completely clueless at this point. I will mention it in tonight's meeting just in case anybody else has an idea about it. Not much I can do beyond that.

mmuetzel commented 1 month ago

If I'd have to guess, I'd put the "blame" on different versions of MinGW. IIRC, the MinGW-w64 maintainers (more or less recently) switched more C math functions to their implementation in the Windows CRT (because they are available and standard-conform in UCRT). Before, they had their own replacement implementations (because these functions have been either not available or not standard-conform in MSVCRT).

ISTR that some exponential functions were among those functions that no longer use the replacement implementation. Maybe, these functions are slightly less accurate in their implementation in the UCRT?

Anyway, just increasing the tolerance a bit seems fine to me. 👍 Thank you for looking into this.