aburrell / apexpy

A Python wrapper for Apex coordinates
MIT License
34 stars 25 forks source link

BUG: Mac memory error #115

Closed ljlamarche closed 1 year ago

ljlamarche commented 1 year ago

Describe the bug

Upon initializing an Apex object, python crashes with a memory error (usually either bus error or a segmentation fault). This may be related to #84 , but it shows up in a slightly different way, and the installation flag that solved that issue doesn't seem to change this one.

To Reproduce

  1. Start python session
  2. Import apexpy
  3. Initialize Apex object
>>> import apexpy
>>> A = apexpy.Apex(2022)
zsh: bus error  python

Expected behavior

Successful creation of Apex object.

Computer

Please provide the following information:

ljlamarche commented 1 year ago

This issue seems to alternate at random between a bus error, segmentation fault, and producing nan output. The following is output from repeatedly running a short test program without any changes.

Test program:

import apexpy

A = apexpy.Apex(2022)
alat, alon = A.geo2apex(60, 15, height=300)
print(alat, alon)

Output:

(apexpy_debug) e30737@Lamarchel-mbp16 Desktop % python temp_apexpy_test.py
LINE 125
LINE 1224 2022.0 /Users/e30737/miniconda3/envs/apexpy_debug/lib/python3.10/site-packages/apexpy/igrf13coeffs.txt
LINE 1227
LINE 130
/Users/e30737/miniconda3/envs/apexpy_debug/lib/python3.10/site-packages/apexpy/apex.py:556: RuntimeWarning: invalid value encountered in <lambda> (vectorized)
  alat, alon = self._geo2apex(glat, glon, height)
/Users/e30737/miniconda3/envs/apexpy_debug/lib/python3.10/site-packages/apexpy/apex.py:559: UserWarning: Apex latitude set to NaN where undefined (apex height may be < reference height)
  warnings.warn(''.join(['Apex latitude set to NaN where undefined ',
nan nan
(apexpy_debug) e30737@Lamarchel-mbp16 Desktop % python temp_apexpy_test.py
LINE 125
LINE 1224 2022.0 /Users/e30737/miniconda3/envs/apexpy_debug/lib/python3.10/site-packages/apexpy/igrf13coeffs.txt
zsh: segmentation fault  python temp_apexpy_test.py
(apexpy_debug) e30737@Lamarchel-mbp16 Desktop % python temp_apexpy_test.py
LINE 125
LINE 1224 2022.0 /Users/e30737/miniconda3/envs/apexpy_debug/lib/python3.10/site-packages/apexpy/igrf13coeffs.txt
zsh: bus error  python temp_apexpy_test.py
(apexpy_debug) e30737@Lamarchel-mbp16 Desktop % python temp_apexpy_test.py
LINE 125
LINE 1224 2022.0 /Users/e30737/miniconda3/envs/apexpy_debug/lib/python3.10/site-packages/apexpy/igrf13coeffs.txt
zsh: segmentation fault  python temp_apexpy_test.py
ljlamarche commented 1 year ago

To debug, I added the -fcheck=all flag to the make file to compile and run the fortran test program. Compiling generates the following warning:

igrf.f90:51:67:

   51 |     num_epochs = count([(s(i:i + 3), i = 1, len_trim(s))]== 'IGRF')
      |                                                                   ^
Warning: 'i' may be used uninitialized [-Wmaybe-uninitialized]
igrf.f90:31:44:

   31 |     integer(4)                   :: state, i, pos
      |                                            ^
note: 'i' was declared here

Running the resulting program generates the following runtime error:

At line 51 of file igrf.f90
Fortran runtime error: Substring out of bounds: lower bound (0) of 's' is less than one

Error termination. Backtrace:
#0  0x1026e3187
#1  0x1026e3d37
#2  0x1026e40f3
#3  0x102247a67
#4  0x1022487c7
#5  0x102252aeb
#6  0x102249aef
#7  0x10225464f

This refers to https://github.com/aburrell/apexpy/blob/a1e2ce62fe9f6607964ab2540ec5e71a6c165612/fortranapex/igrf.f90#L51

aburrell commented 1 year ago

Perhaps if we address all the warnings upon compilation that will reduce potential memory leaks.

ljlamarche commented 1 year ago

One problem was an indexing error related extrapolating IGRF 5 years into the future. Some arrays were not being allocated large enough to accommodate this extra epoch but then were be indexed for it. This has been addressed by expanding the epoch array.

https://github.com/ljlamarche/apexpy/blob/be70342b66da33e3026c3baf44fdafdc22534d90/fortranapex/igrf.f90#L58

https://github.com/ljlamarche/apexpy/blob/be70342b66da33e3026c3baf44fdafdc22534d90/fortranapex/igrf.f90#L69

ljlamarche commented 1 year ago

Perhaps if we address all the warnings upon compilation that will reduce potential memory leaks.

I agree. I'm also trying to implement some stricter compiler flags in the Makefile at least so it does a better job warning us of anything sloppy that may cause issues down the road.

ljlamarche commented 1 year ago

There is potentially an issue with using apexpy for the future extrapolated epochs that are not part of the normal IGRF set of coefficients (currently anything between 2020-2025). These use fewer basis (8 vs 13), which may cause precision errors that are not handled well. Noteably, running the fortran text program will compute the coefficients for all epochs up to 2020 without issue, but produces a series of warnings about an imprecise fit for 2025.

APEX: Imprecise fit of apex: |Bdown/B| 7.0E-06

If you increase the number of iterations in apex.f90 L577, there are fewer errors but they don't go away completely.

When running the test script that generates NaN output, apexpy throws the warning

UserWarning: Apex latitude set to NaN where undefined (apex height may be < reference height)

only for years in the most recent epoch window. It is possible these two are related and the output NaNs are comming from an imprecise fit such that apexpy thinks the coordinate system is undefined.

ljlamarche commented 1 year ago

I think the original indexing problem for extrapolated years was the main issue. The NaN output went away after I updated apexsh.dat. @aburrell , do you know if there are any unit tests that check that time frame specifically? It looked to me like most of them used an epoch of 2000. If not, I'll add one as part of #116 .

aburrell commented 1 year ago

I recently added a new test for "today", but it'd be a good idea to add a test that uses the end of the current extrapolation era.

ljlamarche commented 1 year ago

Some of my initial speculation about this problem was incorrect. It seems to have boiled down to three main things:

  1. The syntax used in igrf.f90 L51-52 may not be valid for all compilers. https://github.com/aburrell/apexpy/blob/a1e2ce62fe9f6607964ab2540ec5e71a6c165612/fortranapex/igrf.f90#L51-L52 This is addressed by counting the number of epochs more explicitly in a loop.

  2. In igrf.f90 L92, the array g must be allocated to have the number of epochs plus 1 to account for the secular variation (SV) parameters. https://github.com/aburrell/apexpy/blob/a1e2ce62fe9f6607964ab2540ec5e71a6c165612/fortranapex/igrf.f90#L92

  3. In magfld.f90, the method by with the epoch index was found was causing an indexing overflow error beyond the last date. https://github.com/aburrell/apexpy/blob/a1e2ce62fe9f6607964ab2540ec5e71a6c165612/fortranapex/magfld.f90#L175-L179

All these points should be addressed in #117.

ljlamarche commented 1 year ago

I recently added a new test for "today", but it'd be a good idea to add a test that uses the end of the current extrapolation era.

Is this what you're referring to? https://github.com/aburrell/apexpy/blob/8436e3fd2d3a5699c4c9ba3ce94f4fe8d3a74cba/apexpy/tests/test_Apex.py#L111-L116

If so, I believe initialization always worked fine for me, I only ran into issue when you actually try to use the object for conversions, so I'm not surprised that wasn't triggered.

aburrell commented 1 year ago

That's it. Good point about the limits of the test, something to change now.

aburrell commented 1 year ago

@ljlamarche is this good to close now?

ljlamarche commented 1 year ago

Yes, this is fixed with #117.

ljlamarche commented 1 year ago

One further note: This article was helpful for understanding what exactly the secular variation code was trying to do.

https://earth-planets-space.springeropen.com/articles/10.1186/s40623-021-01507-z