Closed mcepl closed 2 years ago
We would need to run the tests in a "more verbose" mode to check the values. My guess is that the diff tolerance would need to be adjusted there and those are not real failures. (I may be wrong though, we never tested on 32 bit!)
What do you get if you build the underlying C library (https://github.com/TEOS-10/GSW-C) and run its test function?
Sorry, we don't have that packaged, and I don't have time right now to hunt some proper 32bit VMs to play with it.
(I may be wrong though, we never tested on 32 bit!)
So would you say that 32bit is not (officially) supported?
Yes, unless someone steps forward to handle both 32-bit testing and solving any problems that arise, it will not be supported. I'm surprised that even one test failure has turned up, given that the calculations are all explicitly double precision, which should not depend on 32 vs 64 bit architecture. I would be happy to see someone track this down and supply a PR to fix it, if it really is in the GSW code (probably in GSW-C, if it is), and not in some other arcane detail of the test VM on which the failure occurred.
I did some digging into this one and would like to summarize what I found with some options for what we might do.
Using the i586 version of openSUSE/Tumbleweed, I was able to reproduce the reported test failure in the python version of GSW. Compiling the GSW-C version directly also produced a failure with gsw_check
specifically in the pt_second_derivatives function. I also attempted to reproduce the error the following:
-m32
flag on the above environments-m32
flag when compiling on my Mac (10.14.3) using llvmI then started to mess with the compilation flags for GSW-C in the openSUSE VM where I could reproduce the failure. When compiled with -O0
, all the check functions pass. In an attempt to figure out which optimization was causing the issue, I manually enabled all the flags listed under -O0
of the Optimize-Options page of GCC, but the built library still passed all the gsw_check
functions. I couldn't even reproduce when I used GCC itself to output all the optimization flags it was using.
Some googling for how optimization levels might change doubles I was lead to a stack overflow question which suggested the -ffloat-store
flag. Enabling this on the 32bit openSUSE VM allowed all check functions to pass. Using a command such as CFLAGS=-ffloat-store pip install .
when in the GSW-Python source directory also resulted in an installation which passes all tests.
One more bit of information, at the end of this digging, I finally noticed that the failing function was at the GSW_ERROR_LIMIT
for this function. If I changed the check function (in GSW-C) to look for >
instead of >=
, all the tests pass.
I see some options:
-ffloat-store
flag upstream.-ffloat-store
flag for their i586 build.Option 1 has some performance implications I don't know, but some quick searching shows that it will make things slower, and is unnecessary for most of the systems I can get my hands on. That said, when I hacked the gsw_check
to always output the diff, having the flag enabled really tightened up the differences (most became 0). But, with the option not enabled, the differences were much less than the precision of the instruments, as such, I favor speed in this case.
I think no matter what happens, we might just want to say that 32bit is not officially supported.
@DocOtak, that is an impressive investigation. Thank you.
It looks like some (at least one) of the most recent matlab package check value tolerances differ from what we are using. Three values are used in the failing test. The current Matlab file:
In [3]: xx = matfile.loadmatbunch('gsw_data_v3_0.mat')
In [5]: xx.gsw_cv['pt_SA_SA_ca']
3.122502256758253e-14
In [6]: xx.gsw_cv['pt_SA_CT_ca']
1.1991276027689679e-14
In [7]: xx.gsw_cv['pt_CT_CT_ca']
4.440892098500626e-14
What we are using:
In [19]: cv['pt_SA_SA_ca'].item()
1.734723475976807e-14
In [20]: cv['pt_SA_CT_ca'].item()
1.1991276027689679e-14
In [21]: cv['pt_CT_CT_ca'].item()
4.440892098500626e-14
Notice that only one of these values ('SA_SA') has changed, and it is much larger than the one we are using. Evidently we need to update our check values to match. Maybe this is the only one that has changed--I haven't tried to check that.
Looks like I failed to say that I could only reproduce this problem on the openSUSE system, none of the other 32-bit systems I tried would have this issue. Also, I didn't dig into any differences in i586 vs i686 and other arch issues.
@efiring sounds like we might just want to update the check data... the 1.73[...]e-14 value was the one being given by the gsw_check
tool in GSW-C. My guess would be that this issue would be fixed by doing that.
Unfortunately, it turns out that a wholesale updating of the check values and tolerances breaks 2 other tests.
Does it break them by much? I've been thinking about this and have some questions:
I had some other question... but in the middle of writing this lost track of what it was... I guess a general "what do we do with this information?"
On 2019/02/26 9:41 AM, Andrew Barna wrote:
Does it break them by much? I've been thinking about this and have some questions:
- How are these check values and tolerances made?
Paul Barker will have to answer that. It might be documented somewhere. I presume the check values are simply whatever the current matlab code yields, and the tolerances are the output of some algorithm that estimates how much floating-point slop there might be from one implementation to another.
- Would this need to be fixed in the upstream GSW-C?
Yes, I think we will find that the tolerance is incorrect in GSW-C, because I don't think the numbers used there are any more recent than the ones we are using in GSW-Python. I think they are identical, but I'm not positive.
- For this specific issue, is there concern that the library will produce the "wrong answer" for people using the 32bit openSUSE install?
No, the problem appears to be simply a single incorrect tolerance, not an error in the calculation.
But with respect to the wholesale updating of test and check values: if we want to do that, it will probably require changes in 2 routines, almost certainly at the C level. The first steps would be to see what the magnitude and nature of the new "failures" is, and to figure out what has changed in the matlab version to which the new test and check values apply.
I had some other question... but in the middle of writing this lost track of what it was... I guess a general "what do we do with this information?"
Nothing here is urgent at all; and dealing with this is much harder than it would need to be.
Eric
It fails the same way on aarch64
. But -ffloat-store
workaround does not work there.
@ggardet thanks for the report.
I'm wonder what the implications are for the recently announced Apple transition to an ARM architecture. Same with a major cloud provider (AWS Graviton).
I'm wonder what the implications are for the recently announced Apple transition to an ARM architecture.
They cannot run Fortran on that yet so most of the scientific stack won't have SciPy and numpy. Numpy can be Fortran free by removing f2py
and linking against a non-Fortran blas but that is not the usual way, most of numpy variants out there are openblas or mkl, both rely on Fotran.
For reference, the aarch64 failure is reproducible in a docker container of the right architecture. I think it uses qemu for emulating arm on intel.
I'm testing both aarch64 and ppc64le in https://github.com/conda-forge/gsw-feedstock/pull/22, they both fail the checks.
BTW, this is a commented from someone in conda-forge who is very knowledgeable on this topic:
there's a good chance that the tests expect a high tolerance x86_64 has 80-bit floating point registers and some calculations might benefit from it
I'm not sure what is the best solution here. Maybe we are requiring a precision in the comparisons that does not really make sense for the data and maybe lowering the tolerance for all tests is fine. Or, we could have a separate set of tolerance values for other architectures.
As noted above (https://github.com/TEOS-10/GSW-Python/issues/40#issuecomment-465413874 and subsequent comment), the case that triggered this thread can be closed by updating a tolerance, here and in the C repo, but wholesale updating would require more work.
@PaulMBarker, how are the check-value tolerances calculated?
As noted above (#40 (comment) and subsequent comment), the case that triggered this thread can be closed by updating a tolerance, here and in the C repo, but wholesale updating would require more work.
Yep. I just wanted to bump this so we can get https://github.com/conda-forge/gsw-feedstock/pull/22 merged without ignoring the failures.
OK, I will see if I can come up with a quick fix.
In my emulated arm environment, #63 moved the failing test to the ICT_second_deriv test, Ipt_second_deriv seems happy now though.
Interesting: it looks like the C version closely follows the Matlab version, and both make a fundamental mistake in their numerics that needlessly loses accuracy. Here is an example from part of the Matlab version:
dpt = 1e-2; % increment of potential temperature is 0.01 degrees C.
pt_l = pt - dpt;
pt_u = pt + dpt;
[CT_SA_l, CT_pt_l] = gsw_CT_first_derivatives(SA,pt_l);
[CT_SA_u, CT_pt_u] = gsw_CT_first_derivatives(SA,pt_u);
CT_SA_pt = (CT_SA_u - CT_SA_l)./(pt_u - pt_l);
CT_pt_pt = (CT_pt_u - CT_pt_l)./(pt_u - pt_l);
Notice that those denominators should be just 2 * dpt
, for which (pt_u - pt_l)
is a less accurate estimate, throwing away 2-4 significant digits of precision. @PaulMBarker, would you like to fix that in the Matlab version and update the test values?
Thanks for bringing this to my attention. Fortunately I have been side tracked and have been working on combining all of my observed data into one dataset and have not added the new code to the website.
I have added 3 programmes that I use to make the values contained in the gsw_checkfunctions to the matlab directory.
gsw_check_values_gsw.m - makes the check values gsw_check_values_ca.m - makes the tolerances gsw_check_values_datafile.m - makes the netcdf file
the code is in GSW-Matlab/checkfunctions_code/
Thank you. Now I see how the tolerances are based on perturbations of the check cast values, with the largest resulting perturbation of the function output being used as the tolerance. The perturbations are 1.3e-10 for SA, 6e-10 for t, and 2.3e-8 for p.
I think this can be closed. Test casts and check values are all different now, with the move to the v3_05_16 Matlab zip. Maybe new failures will pop up on some architecture, and we will have to investigate again.
When running test suite (yes, perhaps you are right in #39 and it is an hopeless effort) on 32bit machine (see build log) I get this one test failing: