Closed ljlamarche closed 1 year ago
This issue seems to alternate at random between a bus error, segmentation fault, and producing nan output. The following is output from repeatedly running a short test program without any changes.
Test program:
import apexpy
A = apexpy.Apex(2022)
alat, alon = A.geo2apex(60, 15, height=300)
print(alat, alon)
Output:
(apexpy_debug) e30737@Lamarchel-mbp16 Desktop % python temp_apexpy_test.py
LINE 125
LINE 1224 2022.0 /Users/e30737/miniconda3/envs/apexpy_debug/lib/python3.10/site-packages/apexpy/igrf13coeffs.txt
LINE 1227
LINE 130
/Users/e30737/miniconda3/envs/apexpy_debug/lib/python3.10/site-packages/apexpy/apex.py:556: RuntimeWarning: invalid value encountered in <lambda> (vectorized)
alat, alon = self._geo2apex(glat, glon, height)
/Users/e30737/miniconda3/envs/apexpy_debug/lib/python3.10/site-packages/apexpy/apex.py:559: UserWarning: Apex latitude set to NaN where undefined (apex height may be < reference height)
warnings.warn(''.join(['Apex latitude set to NaN where undefined ',
nan nan
(apexpy_debug) e30737@Lamarchel-mbp16 Desktop % python temp_apexpy_test.py
LINE 125
LINE 1224 2022.0 /Users/e30737/miniconda3/envs/apexpy_debug/lib/python3.10/site-packages/apexpy/igrf13coeffs.txt
zsh: segmentation fault python temp_apexpy_test.py
(apexpy_debug) e30737@Lamarchel-mbp16 Desktop % python temp_apexpy_test.py
LINE 125
LINE 1224 2022.0 /Users/e30737/miniconda3/envs/apexpy_debug/lib/python3.10/site-packages/apexpy/igrf13coeffs.txt
zsh: bus error python temp_apexpy_test.py
(apexpy_debug) e30737@Lamarchel-mbp16 Desktop % python temp_apexpy_test.py
LINE 125
LINE 1224 2022.0 /Users/e30737/miniconda3/envs/apexpy_debug/lib/python3.10/site-packages/apexpy/igrf13coeffs.txt
zsh: segmentation fault python temp_apexpy_test.py
To debug, I added the -fcheck=all
flag to the make file to compile and run the fortran test program. Compiling generates the following warning:
igrf.f90:51:67:
51 | num_epochs = count([(s(i:i + 3), i = 1, len_trim(s))]== 'IGRF')
| ^
Warning: 'i' may be used uninitialized [-Wmaybe-uninitialized]
igrf.f90:31:44:
31 | integer(4) :: state, i, pos
| ^
note: 'i' was declared here
Running the resulting program generates the following runtime error:
At line 51 of file igrf.f90
Fortran runtime error: Substring out of bounds: lower bound (0) of 's' is less than one
Error termination. Backtrace:
#0 0x1026e3187
#1 0x1026e3d37
#2 0x1026e40f3
#3 0x102247a67
#4 0x1022487c7
#5 0x102252aeb
#6 0x102249aef
#7 0x10225464f
This refers to https://github.com/aburrell/apexpy/blob/a1e2ce62fe9f6607964ab2540ec5e71a6c165612/fortranapex/igrf.f90#L51
Perhaps if we address all the warnings upon compilation that will reduce potential memory leaks.
One problem was an indexing error related extrapolating IGRF 5 years into the future. Some arrays were not being allocated large enough to accommodate this extra epoch but then were be indexed for it. This has been addressed by expanding the epoch array.
Perhaps if we address all the warnings upon compilation that will reduce potential memory leaks.
I agree. I'm also trying to implement some stricter compiler flags in the Makefile at least so it does a better job warning us of anything sloppy that may cause issues down the road.
There is potentially an issue with using apexpy for the future extrapolated epochs that are not part of the normal IGRF set of coefficients (currently anything between 2020-2025). These use fewer basis (8 vs 13), which may cause precision errors that are not handled well. Noteably, running the fortran text program will compute the coefficients for all epochs up to 2020 without issue, but produces a series of warnings about an imprecise fit for 2025.
APEX: Imprecise fit of apex: |Bdown/B| 7.0E-06
If you increase the number of iterations in apex.f90 L577, there are fewer errors but they don't go away completely.
When running the test script that generates NaN output, apexpy throws the warning
UserWarning: Apex latitude set to NaN where undefined (apex height may be < reference height)
only for years in the most recent epoch window. It is possible these two are related and the output NaNs are comming from an imprecise fit such that apexpy thinks the coordinate system is undefined.
I think the original indexing problem for extrapolated years was the main issue. The NaN output went away after I updated apexsh.dat
. @aburrell , do you know if there are any unit tests that check that time frame specifically? It looked to me like most of them used an epoch of 2000. If not, I'll add one as part of #116 .
I recently added a new test for "today", but it'd be a good idea to add a test that uses the end of the current extrapolation era.
Some of my initial speculation about this problem was incorrect. It seems to have boiled down to three main things:
The syntax used in igrf.f90 L51-52 may not be valid for all compilers. https://github.com/aburrell/apexpy/blob/a1e2ce62fe9f6607964ab2540ec5e71a6c165612/fortranapex/igrf.f90#L51-L52 This is addressed by counting the number of epochs more explicitly in a loop.
In igrf.f90 L92, the array g must be allocated to have the number of epochs plus 1 to account for the secular variation (SV) parameters. https://github.com/aburrell/apexpy/blob/a1e2ce62fe9f6607964ab2540ec5e71a6c165612/fortranapex/igrf.f90#L92
In magfld.f90, the method by with the epoch index was found was causing an indexing overflow error beyond the last date. https://github.com/aburrell/apexpy/blob/a1e2ce62fe9f6607964ab2540ec5e71a6c165612/fortranapex/magfld.f90#L175-L179
All these points should be addressed in #117.
I recently added a new test for "today", but it'd be a good idea to add a test that uses the end of the current extrapolation era.
Is this what you're referring to? https://github.com/aburrell/apexpy/blob/8436e3fd2d3a5699c4c9ba3ce94f4fe8d3a74cba/apexpy/tests/test_Apex.py#L111-L116
If so, I believe initialization always worked fine for me, I only ran into issue when you actually try to use the object for conversions, so I'm not surprised that wasn't triggered.
That's it. Good point about the limits of the test, something to change now.
@ljlamarche is this good to close now?
Yes, this is fixed with #117.
One further note: This article was helpful for understanding what exactly the secular variation code was trying to do.
https://earth-planets-space.springeropen.com/articles/10.1186/s40623-021-01507-z
Describe the bug
Upon initializing an
Apex
object, python crashes with a memory error (usually either bus error or a segmentation fault). This may be related to #84 , but it shows up in a slightly different way, and the installation flag that solved that issue doesn't seem to change this one.To Reproduce
Apex
objectExpected behavior
Successful creation of
Apex
object.Computer
Please provide the following information: