precision tests too tight for "exotic" platforms

bnavigator commented 4 years ago

Cf. https://github.com/skyfielders/python-skyfield/issues/411

i586


[   37s] ________________ test_all_three_gravity_models_with_twoline2rv _________________

[ 37s] > assert_wgs72old(Satrec.twoline2rv(LINE1, LINE2, WGS72OLD)) [ 37s] E AssertionError: -3754.2514732427567 != -3754.251473242793 within 12 places (3.637978807091713e-11 difference)

[ 37s] _____ test_all_three_gravity_models_with_sgp4init __ [ 37s] sat.sgp4init(WGS72OLD, 'i', VANGUARD_ATTRS['satnum'], VANGUARD_EPOCH, *args) [ 37s] > assert_wgs72old(sat) [ 37s] E AssertionError: -3754.2514732427567 != -3754.251473242793 within 12 places (3.637978807091713e-11 difference)


- aarch64, ppc64, ppc64le, s390x all produce the same numbers for WGS84

[ 50s] > assert_wgs84(Satrec.twoline2rv(LINE1, LINE2, WGS84)) [ 50s] E AssertionError: 4719.227897029575 != 4719.227897029576 within 12 places (9.094947017729282e-13 difference)

[ 50s] sat.sgp4init(WGS84, 'i', VANGUARD_ATTRS['satnum'], VANGUARD_EPOCH, *args) [ 50s] > assert_wgs84(sat) [ 50s] E AssertionError: 4719.227897029575 != 4719.227897029576 within 12 places (9.094947017729282e-13 difference)

brandon-rhodes commented 4 years ago

I have just landed an adjustment of the gravity parameter test precision for those platforms — let me know if it looks like it will solve your use cases.

brandon-rhodes commented 4 years ago

(I tried adding a 32-bit test container in the hopes of simulating your i586 results, the kind of Docker container I use for the same thing in Skyfield, but the tests passed just fine, so I'm not able to reproduce the problem locally.)

bnavigator commented 4 years ago

I applied your patch and the previously reported tests pass.

However, the following ~~new~~ failures emerged (on i586 only):

[   26s] =================================== FAILURES ===================================
[   26s] ________________ test_satrec_against_tcppver_using_julian_dates ________________
[   26s] >       run_satellite_against_tcppver(Satrec.twoline2rv, invoke, [1,1,6,6,4,3,6])

[   26s] E               
[   26s] E               Expected: '       0.00000000   -9301.24542292    3326.10200382    2318.36441127 -8.729303005 -0.828225037 -0.122314827\n'
[   26s] E               Got back: '       0.00000000   -9301.24542703    3326.10200352    2318.36441126 -8.729303003 -0.828225037 -0.122314827\n'

[   26s] ___________________ test_satrec_against_tcppver_using_tsince ___________________
[   26s] >       run_satellite_against_tcppver(Satrec.twoline2rv, invoke, [1,1,6,6,4,3,6])

[   26s] E               ValueError: Line 278 of output does not match:
[   26s] E               
[   26s] E               Expected: '       0.00000000   -9301.24542292    3326.10200382    2318.36441127 -8.729303005 -0.828225037 -0.122314827\n'
[   26s] E               Got back: '       0.00000000   -9301.24542703    3326.10200352    2318.36441126 -8.729303003 -0.828225037 -0.122314827\n'
[   26s] 
[   26s] ../BUILDROOT/python-sgp4-2.13-7.1.i386/usr/lib/python3.8/site-packages/sgp4/tests.py:520: ValueError
[   26s] =========================== short test summary info ============================
[   26s] FAILED ::test_satrec_against_tcppver_using_julian_dates - ValueError: Line 27...
[   26s] FAILED ::test_satrec_against_tcppver_using_tsince - ValueError: Line 278 of o...
[   26s] ========================= 2 failed, 28 passed in 0.24s =========================
[   26s] error: Bad exit status from /var/tmp/rpm-tmp.JQU3T3 (%check)

~~This is independent of 8d6da33. The repositories got bumped just a few hours ago. I suspect an updated numpy, blas/lapack/openblas library or even glibc.~~ I also suspect that in your Docker container the used library does not have that problem for 32-bit.

Edit: Ah scratch that, in previous package versions I had also changed https://github.com/brandon-rhodes/python-sgp4/blob/8d6da331aace3b9c78e9973273911bc269d21d7e/sgp4/tests.py#L37 to read

error=2e-5

but I accidentally removed that modification when I replaced my own GRAVITY_DIGITS fix with 8d6da33

brandon-rhodes commented 3 years ago

Are there any further tweaks that the test suite needs at this point? Or can this issue be closed for now?

bnavigator commented 3 years ago

nothing has changed since https://github.com/brandon-rhodes/python-sgp4/issues/69#issuecomment-710290510

brandon-rhodes commented 3 years ago

Ah, so “scratch that” did not apply to the whole comment? We can keep the issue open then.

bnavigator commented 3 years ago

Yes I just tested it again, the relax to 2e-5 for 32bit libraries is needed.

mworion commented 3 years ago

just to add: as I currently making the transfer to GitHub Actions I could reproduce it on the 32 bit build but not on the 64 bit build Attached the 32 bit error message:

======================================================================
1242
  ERROR: test_satrec_against_tcppver_using_julian_dates (__main__.TestFunctions)
1243
  ----------------------------------------------------------------------
1244
  Traceback (most recent call last):
1245
    File "/tmp/tmp.V4FRTva0Ra/venv/lib/python3.9/site-packages/sgp4/tests.py", line 498, in test_satrec_against_tcppver_using_julian_dates
1246
      run_satellite_against_tcppver(Satrec.twoline2rv, invoke, [1,1,6,6,4,3,6])
1247
    File "/tmp/tmp.V4FRTva0Ra/venv/lib/python3.9/site-packages/sgp4/tests.py", line 579, in run_satellite_against_tcppver
1248
      raise ValueError(
1249
  ValueError: Line 278 of output does not match:
1250

1251
  Expected: '       0.00000000   -9301.24542292    3326.10200382    2318.36441127 -8.729303005 -0.828225037 -0.122314827\n'
1252
  Got back: '       0.00000000   -9301.24542703    3326.10200352    2318.36441126 -8.729303003 -0.828225037 -0.122314827\n'
1253

1254
  ======================================================================
1255
  ERROR: test_satrec_against_tcppver_using_tsince (__main__.TestFunctions)
1256
  ----------------------------------------------------------------------
1257
  Traceback (most recent call last):
1258
    File "/tmp/tmp.V4FRTva0Ra/venv/lib/python3.9/site-packages/sgp4/tests.py", line 506, in test_satrec_against_tcppver_using_tsince
1259
      run_satellite_against_tcppver(Satrec.twoline2rv, invoke, [1,1,6,6,4,3,6])
1260
    File "/tmp/tmp.V4FRTva0Ra/venv/lib/python3.9/site-packages/sgp4/tests.py", line 579, in run_satellite_against_tcppver
1261
      raise ValueError(
1262
  ValueError: Line 278 of output does not match:
1263

1264
  Expected: '       0.00000000   -9301.24542292    3326.10200382    2318.36441127 -8.729303005 -0.828225037 -0.122314827\n'
1265
  Got back: '       0.00000000   -9301.24542703    3326.10200352    2318.36441126 -8.729303003 -0.828225037 -0.122314827\n'
1266

brandon-rhodes commented 3 years ago

@mworion — Could you try triple-backticks around that output? It will make it easier to read.

Longer-term, I am interested in what compilation options we can use when building the module that would make it behave the same under these exotic platforms.

mworion commented 3 years ago

@brandon-rhodes : sorry about the formatting. For compiler options I hardly could help.

brandon-rhodes commented 3 years ago

@mworion — I have fixed the formatting for you using "Edit" and adding three back-ticks.

Compiler options will require some research about how to ask for IEEE 64-bit floating point behavior, regardless of the underlying hardware. It's fine if you're not interested in tackling it!

mworion commented 3 years ago

@brandon-rhodes Thanks for the cleanup of my post. Probably good news for this issue. By review the compiler issue the following results: During the GitHub action implementation this is reproducible on 32bit platforms (not 64bit ones). Error occurs when keeping error 2e-7, goes away when lowering the error to 2e-5. I checked some compiler setting in this areas based on some thoughts for accelerated tests for 32bit platform os. Good description of the problem related to IEEE754: (without good solution and from 2008): http://christian-seiler.de/projekte/fpmath/.

Detailed results on automatic tests:

plain 32bit

  ERROR: test_satrec_against_tcppver_using_julian_dates (__main__.TestFunctions)
  ValueError: Line 278 of output does not match:
  Expected: '       0.00000000   -9301.24542292    3326.10200382    2318.36441127 -8.729303005 -0.828225037 -0.122314827\n'
  Got back: '       0.00000000   -9301.24542703    3326.10200352    2318.36441126 -8.729303003 -0.828225037 -0.122314827\n'

  ERROR: test_satrec_against_tcppver_using_tsince (__main__.TestFunctions)
  ValueError: Line 278 of output does not match:
  Expected: '       0.00000000   -9301.24542292    3326.10200382    2318.36441127 -8.729303005 -0.828225037 -0.122314827\n'
  Got back: '       0.00000000   -9301.24542703    3326.10200352    2318.36441126 -8.729303003 -0.828225037 -0.122314827\n'

32bit compiler option "-O", "-msse2", "-fexcess-precision"

  ERROR: test_satrec_against_tcppver_using_julian_dates (__main__.TestFunctions)
  ValueError: Line 278 of output does not match:
  Expected: '       0.00000000   -9301.24542292    3326.10200382    2318.36441127 -8.729303005 -0.828225037 -0.122314827\n'
  Got back: '       0.00000000   -9301.24542703    3326.10200352    2318.36441126 -8.729303003 -0.828225037 -0.122314827\n'

  ERROR: test_satrec_against_tcppver_using_tsince (__main__.TestFunctions)
  ValueError: Line 278 of output does not match:
  Expected: '       0.00000000   -9301.24542292    3326.10200382    2318.36441127 -8.729303005 -0.828225037 -0.122314827\n'
  Got back: '       0.00000000   -9301.24542703    3326.10200352    2318.36441126 -8.729303003 -0.828225037 -0.122314827\n'

32bit compiler option "-ffloat-store"

  no errors so far, seems to work.

brandon-rhodes commented 3 years ago

Wow! Amazing! No errors with that switch set? Does the switch have a serious performance hit, or does the library still run at roughly the same speed? (If it has no performance impact, maybe we could leave it on for all platforms; but if instead it does have a performance impact, I think we will want to only activate it in setup.py when a 32-bit platform is detected.)

mworion commented 3 years ago

To be honest: I don't know. As I compare pure python test runtime with accelerated one, there is 30%-50% speed improvement. For that I normally would not implement a C++ version. Do you have a good performance test routine? I will test it. I will add that split in the setup to countercheck.

brandon-rhodes commented 3 years ago

Do you have a good performance test routine? I will test it. I will add that split in the setup to countercheck.

Let's do something simpler: let's go ahead and land your code with the new option always used, and I can then do a bit of performance testing to see if it should sometimes be turned off.

I had thought I remembered a 10× or 100× speedup for using the C++ code, if we are only getting <2× then I might need to review whether the code is really working (or maybe I just have a bad memory!).

mworion commented 3 years ago

I think the 39 tests are not really a performance benchmark at all. All test results are ready between 200-350ms regardless which os, python, arhcitecture or wheel.

I'll keep it in.

brandon-rhodes commented 3 years ago

Oh! I thought you mean the call to sgp4init() was only 30–50% faster. But, yes, you meant that the whole test run was only a little bit faster. Yes, then I agree that the number makes sense.

brandon-rhodes commented 3 years ago

Following the addition of the extra_compile_args=['-ffloat-store'] option to setup.py, I am seeing SGP4 behavior that is much more numerically stable across platforms. In particular, I see the report:

precision of float in 32 bit tests accelerated -> solution: use compiler option "-ffloat-store", which seems to solve the problem, at least the tests run without errors.

— by @mworion in the https://github.com/brandon-rhodes/python-sgp4/issues/84#issuecomment-763766486 comment.

In the hope that “exotic” platforms will now show the same IEEE floating point behavior expected by the C++ of the underlying SGP4 library, I am going to optimistically close this issue, as the improvement should appear with the next release of SGP4. In the meantime the (hopefully) improved stability may be tested locally with:

pip install -U https://github.com/brandon-rhodes/python-sgp4/archive/master.zip

bnavigator commented 3 years ago

If I understand the manpage of gcc correctly, then you sacrifice possible extra precision in real world applications for reproducibility in the test suite.

       -ffloat-store
           Do not store floating-point variables in registers, and inhibit other options that might
           change whether a floating-point value is taken from a register or memory.

           This option prevents undesirable excess precision on machines such as the 68000 where the
           floating registers (of the 68881) keep more precision than a "double" is supposed to
           have.  Similarly for the x86 architecture.  For most programs, the excess precision does
           only good, but a few programs rely on the precise definition of IEEE floating point.  Use
           -ffloat-store for such programs, after modifying them to store all pertinent intermediate
           computations into variables.

https://gcc.gnu.org/wiki/FloatingPointMath

The extra intermediate precision and range may cause flags not be set or traps not be raised. Also, for double precision, double rounding may affect the final results. Whether or not intermediate results are rounded to double precision or extended precision depends on optimizations being able to keep values in floating-point registers. The option -ffloat-store prevents GCC from storing floating-point results in registers. While this avoids the indeterministic behavior just described (at great cost), it does not prevent accuracy loss due to double rounding.

https://gcc.gnu.org/wiki/x87note

The x86's historical floating point unit (FPU) is the x87 unit. The x87 unit uses IEEE 80 bit arithmetic with 8 80 bit registers organized as a stack. Thus, by default x87 arithmetic is not true 64/32 bit IEEE, but gets extended precision from the x87 unit. However, anytime a value is moved from the registers to an IEEE 64 or 32 bit storage location, this 80 bit value must be rounded down to the appropriate number of bits. Thus computations performed by the x87 unit provide at least the same accuracy as strict 32/64 bit IEEE, with the upper limit being 80 bit IEEE accuracy.

So, the extra accuracy is almost always beneficial, but it can have some strange consequences.

brandon-rhodes commented 3 years ago

If I understand the manpage of gcc correctly, then you sacrifice possible extra precision in real world applications for reproducibility in the test suite.

Yes, that’s correct. And if turning on 64-bit IEEE had suddenly broken our agreement with the official SGP4 test results, then I would worry that the library’s behavior was being compromised. But since the compile option seems to have snapped every architecture — including x86 — into close agreement with the suite, I think that’s evidence that we are now implementing the IEEE behavior that the SGP4 authors were assuming.

Note that with an empirical model like SGP4, extra digits of precision do not provide better results — it’s not as though SGP4 is really a particularly exact model of how satellites really behave (it has several shortcomings), that would give us more accurate real-world satellite positions if only we provided it with more decimal digits of precision. Rather, it’s treated as a curve generator that the satellite authorities play with until they find a set of parameters that produce roughly the upcoming set of positions that are expected for the satellite, and then they publish those as the TLE. (At least that’s my understanding from reading Kelso?) Any tweaks or compile options that make an SGP4 implementation drift from the officially expected output will thereby make it drift from the series of upcoming positions that the TLE publisher is trying to make SGP4 produce on your machine.

So I understand this compile option to be a strict improvement in the Python library’s behavior. If it sounds like there’s a downside I’m missing, comment further, and I can try to track down the Kelso that I’m vaguely remembering and we can double check against that and other authorities.

brandon-rhodes / python-sgp4

precision tests too tight for "exotic" platforms #69