mcepl commented 6 years ago

When running test suite (yes, perhaps you are right in #39 and it is an hopeless effort) on 32bit machine (see build log) I get this one test failing:

======================== FAILURES ===================================
_________________________ test_check_function[cfcf71] __________________________

cfcf = ({'n2min_alpha': array([[3.18312465e-04, 3.13145844e-04, 9.01115850e-05],
       [3.18142757e-04, 3.13790954e-04, 6.43...           nan]]), 'Ipt_second_deriv': array([15], dtype=int32)}, <check_functions.FunctionCheck object at 0xaa1a6f4c>)

    def test_check_function(cfcf):
        cv, cf, mfunc = cfcf
        mfunc.run(locals())
        if mfunc.exception is not None or not mfunc.passed:
            print('\n', mfunc.name)
            print('  ', mfunc.runline)
            print('  ', mfunc.testline)
            if mfunc.exception is None:
                mfunc.exception = ValueError('Calculated values are different from the expected matlab results.')
>           raise mfunc.exception
E           ValueError: Calculated values are different from the expected matlab results.

gsw/tests/test_check_functions.py:62: ValueError
----------------------------- Captured stdout call -----------------------------

 pt_second_derivatives
   cf.pt_SA_SA, cf.pt_SA_CT, cf.pt_CT_CT = pt_second_derivatives(cv.SA_chck_cast,cv.CT_chck_cast)
   cf.Ipt_second_deriv = find((abs(cv.pt_SA_SA - cf.pt_SA_SA) >= cv.pt_SA_SA_ca  ) | ( abs(cv.pt_SA_CT - cf.pt_SA_CT) >= cv.pt_SA_CT_ca ) | ( abs(cv.pt_CT_CT - cf.pt_CT_CT) >= cv.pt_CT_CT_ca))
===================== 1 failed, 162 passed in 3.65 seconds =====================
error: Bad exit status from /var/tmp/rpm-tmp.0NSNJz (%check)

ocefpaf commented 6 years ago

We would need to run the tests in a "more verbose" mode to check the values. My guess is that the diff tolerance would need to be adjusted there and those are not real failures. (I may be wrong though, we never tested on 32 bit!)

efiring commented 6 years ago

What do you get if you build the underlying C library (https://github.com/TEOS-10/GSW-C) and run its test function?

mcepl commented 6 years ago

Sorry, we don't have that packaged, and I don't have time right now to hunt some proper 32bit VMs to play with it.

sebix commented 5 years ago

(I may be wrong though, we never tested on 32 bit!)

So would you say that 32bit is not (officially) supported?

efiring commented 5 years ago

Yes, unless someone steps forward to handle both 32-bit testing and solving any problems that arise, it will not be supported. I'm surprised that even one test failure has turned up, given that the calculations are all explicitly double precision, which should not depend on 32 vs 64 bit architecture. I would be happy to see someone track this down and supply a PR to fix it, if it really is in the GSW code (probably in GSW-C, if it is), and not in some other arcane detail of the test VM on which the failure occurred.

DocOtak commented 5 years ago

I did some digging into this one and would like to summarize what I found with some options for what we might do.

The Journey

Using the i586 version of openSUSE/Tumbleweed, I was able to reproduce the reported test failure in the python version of GSW. Compiling the GSW-C version directly also produced a failure with gsw_check specifically in the pt_second_derivatives function. I also attempted to reproduce the error the following:

32 bit ubuntu 14.04 VM (i686)
32 bit ubuntu docker (i686)
setting the -m32 flag on the above environments
setting the -m32 flag when compiling on my Mac (10.14.3) using llvm

I then started to mess with the compilation flags for GSW-C in the openSUSE VM where I could reproduce the failure. When compiled with -O0, all the check functions pass. In an attempt to figure out which optimization was causing the issue, I manually enabled all the flags listed under -O0 of the Optimize-Options page of GCC, but the built library still passed all the gsw_check functions. I couldn't even reproduce when I used GCC itself to output all the optimization flags it was using.

Some googling for how optimization levels might change doubles I was lead to a stack overflow question which suggested the -ffloat-store flag. Enabling this on the 32bit openSUSE VM allowed all check functions to pass. Using a command such as CFLAGS=-ffloat-store pip install . when in the GSW-Python source directory also resulted in an installation which passes all tests.

One more bit of information, at the end of this digging, I finally noticed that the failing function was at the GSW_ERROR_LIMIT for this function. If I changed the check function (in GSW-C) to look for > instead of >=, all the tests pass.

What should we do?

I see some options:

Enable the -ffloat-store flag upstream.
Change the check functions to look for things only greater than the allowed limit, not greater than or equal to.
Have the openSUSE maintainer enable the -ffloat-store flag for their i586 build.
Set this as a won't fix.

Option 1 has some performance implications I don't know, but some quick searching shows that it will make things slower, and is unnecessary for most of the systems I can get my hands on. That said, when I hacked the gsw_check to always output the diff, having the flag enabled really tightened up the differences (most became 0). But, with the option not enabled, the differences were much less than the precision of the instruments, as such, I favor speed in this case.

I think no matter what happens, we might just want to say that 32bit is not officially supported.

efiring commented 5 years ago

@DocOtak, that is an impressive investigation. Thank you.

It looks like some (at least one) of the most recent matlab package check value tolerances differ from what we are using. Three values are used in the failing test. The current Matlab file:

In [3]: xx = matfile.loadmatbunch('gsw_data_v3_0.mat')

In [5]: xx.gsw_cv['pt_SA_SA_ca']
3.122502256758253e-14

In [6]: xx.gsw_cv['pt_SA_CT_ca']
1.1991276027689679e-14

In [7]: xx.gsw_cv['pt_CT_CT_ca']
4.440892098500626e-14

What we are using:

In [19]: cv['pt_SA_SA_ca'].item()
1.734723475976807e-14

In [20]: cv['pt_SA_CT_ca'].item()
1.1991276027689679e-14

In [21]: cv['pt_CT_CT_ca'].item()
4.440892098500626e-14

Notice that only one of these values ('SA_SA') has changed, and it is much larger than the one we are using. Evidently we need to update our check values to match. Maybe this is the only one that has changed--I haven't tried to check that.

DocOtak commented 5 years ago

Looks like I failed to say that I could only reproduce this problem on the openSUSE system, none of the other 32-bit systems I tried would have this issue. Also, I didn't dig into any differences in i586 vs i686 and other arch issues.

@efiring sounds like we might just want to update the check data... the 1.73[...]e-14 value was the one being given by the gsw_check tool in GSW-C. My guess would be that this issue would be fixed by doing that.

efiring commented 5 years ago

Unfortunately, it turns out that a wholesale updating of the check values and tolerances breaks 2 other tests.

DocOtak commented 5 years ago

Does it break them by much? I've been thinking about this and have some questions:

How are these check values and tolerances made?
Would this need to be fixed in the upstream GSW-C?
For this specific issue, is there concern that the library will produce the "wrong answer" for people using the 32bit openSUSE install?

I had some other question... but in the middle of writing this lost track of what it was... I guess a general "what do we do with this information?"

efiring commented 5 years ago

On 2019/02/26 9:41 AM, Andrew Barna wrote:

Does it break them by much? I've been thinking about this and have some questions:

How are these check values and tolerances made?

Paul Barker will have to answer that. It might be documented somewhere. I presume the check values are simply whatever the current matlab code yields, and the tolerances are the output of some algorithm that estimates how much floating-point slop there might be from one implementation to another.

Would this need to be fixed in the upstream GSW-C?

Yes, I think we will find that the tolerance is incorrect in GSW-C, because I don't think the numbers used there are any more recent than the ones we are using in GSW-Python. I think they are identical, but I'm not positive.

For this specific issue, is there concern that the library will produce the "wrong answer" for people using the 32bit openSUSE install?

No, the problem appears to be simply a single incorrect tolerance, not an error in the calculation.

But with respect to the wholesale updating of test and check values: if we want to do that, it will probably require changes in 2 routines, almost certainly at the C level. The first steps would be to see what the magnitude and nature of the new "failures" is, and to figure out what has changed in the matlab version to which the new test and check values apply.

I had some other question... but in the middle of writing this lost track of what it was... I guess a general "what do we do with this information?"

Nothing here is urgent at all; and dealing with this is much harder than it would need to be.

Eric

ggardet commented 4 years ago

It fails the same way on aarch64. But -ffloat-store workaround does not work there.

DocOtak commented 4 years ago

@ggardet thanks for the report.

I'm wonder what the implications are for the recently announced Apple transition to an ARM architecture. Same with a major cloud provider (AWS Graviton).

ocefpaf commented 4 years ago

I'm wonder what the implications are for the recently announced Apple transition to an ARM architecture.

They cannot run Fortran on that yet so most of the scientific stack won't have SciPy and numpy. Numpy can be Fortran free by removing f2py and linking against a non-Fortran blas but that is not the usual way, most of numpy variants out there are openblas or mkl, both rely on Fotran.

DocOtak commented 4 years ago

For reference, the aarch64 failure is reproducible in a docker container of the right architecture. I think it uses qemu for emulating arm on intel.

pytest FAILURES

``` ================================================================================================================================ FAILURES ================================================================================================================================ ______________________________________________________________________________________________________________________ test_check_function[cfcf72] _______________________________________________________________________________________________________________________ cfcf = ({'n2min_alpha': array([[3.18312465e-04, 3.13145844e-04, 9.01115850e-05], [3.18142757e-04, 3.13790954e-04, 6.43...e-05, nan]]), 'Ipt_second_deriv': array([ 7, 24])}, ) def test_check_function(cfcf): cv, cf, mfunc = cfcf mfunc.run(locals()) if mfunc.exception is not None or not mfunc.passed: print('\n', mfunc.name) print(' ', mfunc.runline) print(' ', mfunc.testline) if mfunc.exception is None: mfunc.exception = ValueError('Calculated values are different from the expected matlab results.') > raise mfunc.exception E ValueError: Calculated values are different from the expected matlab results. gsw/tests/test_check_functions.py:62: ValueError -------------------------------------------------------------------------------------------------------------------------- Captured stdout call -------------------------------------------------------------------------------------------------------------------------- pt_second_derivatives cf.pt_SA_SA, cf.pt_SA_CT, cf.pt_CT_CT = pt_second_derivatives(cv.SA_chck_cast,cv.CT_chck_cast) cf.Ipt_second_deriv = find((abs(cv.pt_SA_SA - cf.pt_SA_SA) >= cv.pt_SA_SA_ca ) | ( abs(cv.pt_SA_CT - cf.pt_SA_CT) >= cv.pt_SA_CT_ca ) | ( abs(cv.pt_CT_CT - cf.pt_CT_CT) >= cv.pt_CT_CT_ca)) ======================================================================================================================== short test summary info ========================================================================================================================= FAILED gsw/tests/test_check_functions.py::test_check_function[cfcf72] - ValueError: Calculated values are different from the expected matlab results. ===================================================================================================================== 1 failed, 168 passed in 20.02s ===================================================================================================================== ```

ocefpaf commented 4 years ago

I'm testing both aarch64 and ppc64le in https://github.com/conda-forge/gsw-feedstock/pull/22, they both fail the checks.

ocefpaf commented 4 years ago

BTW, this is a commented from someone in conda-forge who is very knowledgeable on this topic:

there's a good chance that the tests expect a high tolerance x86_64 has 80-bit floating point registers and some calculations might benefit from it

I'm not sure what is the best solution here. Maybe we are requiring a precision in the comparisons that does not really make sense for the data and maybe lowering the tolerance for all tests is fine. Or, we could have a separate set of tolerance values for other architectures.

efiring commented 4 years ago

As noted above (https://github.com/TEOS-10/GSW-Python/issues/40#issuecomment-465413874 and subsequent comment), the case that triggered this thread can be closed by updating a tolerance, here and in the C repo, but wholesale updating would require more work.

@PaulMBarker, how are the check-value tolerances calculated?

ocefpaf commented 4 years ago

As noted above (#40 (comment) and subsequent comment), the case that triggered this thread can be closed by updating a tolerance, here and in the C repo, but wholesale updating would require more work.

Yep. I just wanted to bump this so we can get https://github.com/conda-forge/gsw-feedstock/pull/22 merged without ignoring the failures.

efiring commented 4 years ago

OK, I will see if I can come up with a quick fix.

efiring commented 4 years ago

63 should close this--unless it ends up triggering some new failures on some architectures. Almost everything changed in the test input data, check values, and tolerances between the matlab version we were using as our base before, and what is in the current Matlab zipfile, v3_06_11, which is used in #63.

DocOtak commented 4 years ago

In my emulated arm environment, #63 moved the failing test to the ICT_second_deriv test, Ipt_second_deriv seems happy now though.

``` ================================= test session starts ================================== platform linux -- Python 3.8.2, pytest-6.0.1, py-1.9.0, pluggy-0.13.1 rootdir: /GSW-Python collected 169 items gsw/tests/test_check_functions.py .............................................. [ 27%] ......................F......................................................... [ 74%] ............................... [ 92%] gsw/tests/test_geostrophy.py ........ [ 97%] gsw/tests/test_utility.py .... [100%] ======================================= FAILURES ======================================= _____________________________ test_check_function[cfcf68] ______________________________ cfcf = ({'SA_Arctic': array([[27.66974178, 28.53097125, 29.79813634], [28.13780824, 28.15201013, 29.69700253], ...0517e-05, nan]]), 'ICT_second_deriv': array([18])}, ) def test_check_function(cfcf): cv, cf, mfunc = cfcf mfunc.run(locals()) if mfunc.exception is not None or not mfunc.passed: print('\n', mfunc.name) print(' ', mfunc.runline) print(' ', mfunc.testline) if mfunc.exception is None: mfunc.exception = ValueError('Calculated values are different from the expected matlab results.') > raise mfunc.exception E ValueError: Calculated values are different from the expected matlab results. gsw/tests/test_check_functions.py:62: ValueError --------------------------------- Captured stdout call --------------------------------- CT_second_derivatives cf.CT_SA_SA, cf.CT_SA_pt, cf.CT_pt_pt = CT_second_derivatives(cv.SA_chck_cast,cf.pt) cf.ICT_second_deriv = find((abs(cv.CT_SA_SA - cf.CT_SA_SA) >= cv.CT_SA_SA_ca ) | ( abs(cv.CT_SA_pt - cf.CT_SA_pt) >= cv.CT_SA_pt_ca ) | ( abs(cv.CT_pt_pt - cf.CT_pt_pt) >= cv.CT_pt_pt_ca)) =============================== short test summary info ================================ FAILED gsw/tests/test_check_functions.py::test_check_function[cfcf68] - ValueError: C... ============================ 1 failed, 168 passed in 18.19s ============================ ```

efiring commented 4 years ago

Interesting: it looks like the C version closely follows the Matlab version, and both make a fundamental mistake in their numerics that needlessly loses accuracy. Here is an example from part of the Matlab version:

dpt  = 1e-2;        % increment of potential temperature is 0.01 degrees C.
pt_l = pt - dpt;
pt_u = pt + dpt;

[CT_SA_l, CT_pt_l] = gsw_CT_first_derivatives(SA,pt_l);
[CT_SA_u, CT_pt_u] = gsw_CT_first_derivatives(SA,pt_u);

CT_SA_pt = (CT_SA_u - CT_SA_l)./(pt_u - pt_l);
CT_pt_pt = (CT_pt_u - CT_pt_l)./(pt_u - pt_l);

Notice that those denominators should be just 2 * dpt, for which (pt_u - pt_l) is a less accurate estimate, throwing away 2-4 significant digits of precision. @PaulMBarker, would you like to fix that in the Matlab version and update the test values?

PaulMBarker commented 4 years ago

Thanks for bringing this to my attention. Fortunately I have been side tracked and have been working on combining all of my observed data into one dataset and have not added the new code to the website.

PaulMBarker commented 4 years ago

I have added 3 programmes that I use to make the values contained in the gsw_checkfunctions to the matlab directory.

gsw_check_values_gsw.m - makes the check values gsw_check_values_ca.m - makes the tolerances gsw_check_values_datafile.m - makes the netcdf file

the code is in GSW-Matlab/checkfunctions_code/

efiring commented 4 years ago

Thank you. Now I see how the tolerances are based on perturbations of the check cast values, with the largest resulting perturbation of the function output being used as the tolerance. The perturbations are 1.3e-10 for SA, 6e-10 for t, and 2.3e-8 for p.

efiring commented 2 years ago

I think this can be closed. Test casts and check values are all different now, with the move to the v3_05_16 Matlab zip. Maybe new failures will pop up on some architecture, and we will have to investigate again.

TEOS-10 / GSW-Python

test_check_function[cfcf71] fails on 32bit openSUSE #40

The Journey

What should we do?