bwinkel / cysgp4

A wrapper around the SGP4 package, for sat TLE calculations
Apache License 2.0
23 stars 2 forks source link

Segmentation fault, and a couple of API questions #8

Open brandon-rhodes opened 4 years ago

brandon-rhodes commented 4 years ago

First: I'm excited about the idea of a machine-speed SGP4 implementation that can be called from Python! This will meet an important need for folks finding the pure-Python sgp4 not fast enough for their use cases, and who are able to install or compile binary packages on their systems.

I would like to add a section to Skyfield's documentation showing how to use cysgp4 to produce the coordinates that Skyfield needs, for users who need to extra performance. My first effort at producing coordinates, alas, crashed the interpreter:

# from skyfield.api import EarthSatellite, Topos, load

# ts = load.timescale()
# t = ts.now()

line1 = '1 25544U 98067A   14020.93268519  .00009878  00000-0  18200-3 0  5082'
line2 = '2 25544  51.6498 109.4756 0003572  55.9686 274.8005 15.49815350868473'

# boston = Topos('42.3583 N', '71.0603 W')

all_lines = ['Foof', line1, line2]
tles = [(line,) for line in all_lines]

from cysgp4 import PyTle, PyObserver, propagate_many
mjds = [58805.5]
result = propagate_many(mjds, tles)
print(result)

The result:

zsh: segmentation fault  python tmp16.py

Additionally, here are a few further thoughts that I'll have once cysgp4 works instead of crashing. I can open issues for them later if it's appropriate:

  1. The documentation on PyPI — thanks for providing a useful page with examples, by the way! — does not include a dedicated link to the repository, which was a slight inconvenience. In the meantime I found it by following the link for Jupyter Notebooks, which is a link to GitHub.
  2. The mighty propagate_many() interface, which will be of the most interest to Skyfield folks, unfortunately does not cache the Satellite objects it creates but takes raw TLE lines that must be re-created each time propagate_many() is called. This could be a noticeable cost to folks needing to get the same array of satellites and do many propagations on them (for, say, animating a live globe). Could the library surface a native Satellite array-of-structs?
  3. All Skyfield needs is ECI coordinates. It would be magnificent if there were a way, after turning a list of TLE lines into a native array of Satellite structs, to run a high-performance loop that produced ECI coordiantes from those structs without any of the if-else logic surrounding topocentric computations.
  4. I wonder if the expensive struct copies inside the propagation could be turned off? I am not imagining that Skyfield users will be concerned about their native array-of-structs, if they had one, being modified in-place during a computation, if doing so produced the fastest possible computation. Maybe the entire array could be duplicated n times for n processing threads, but not copied any further times than parallelism made necessary? (I've not studied the parallelism primitives in use here so this suggestion might not make sense.)
  5. Finally, now that I look again at the code, the Satellite object itself looks very heavyweight for what Skyfield needs. It would be faster to inline the SGP4 *thisptr instead of having separate malloc'd memory, for one thing; and for another, Skyfield would need none of the other attributes or properties of the class. The ideal interface would be (1) a function that takes TLE lines and returns an array of raw inlined SGP4 structs, and (2) a function that takes the structs and an array of MJD's and produces an array of positions. That would, with the least overhead and levels of indirection, allow massive arrays of satellites to produce coordinates ready for Skyfield. Would you accept a pull request, if I had time to work on one, that provided a simple low-level API of this kind for the use of other astronomy libraries?

I hope it's not too inconvenient for me to have attached my first-impression questions to a substantive issue! Again, let me know if you'd like separate issues spun up for them, and thanks for your work on this library!

bwinkel commented 4 years ago

Dear @brandon-rhodes,

thanks so much for your detailed thoughts. It's great that you're interested in the package!

In your example, the problem is with the interface. The propagate_many function needs a list of PyTle objects. The following works without a seg-fault:

from cysgp4 import PyTle, propagate_many

line1 = '1 25544U 98067A   14020.93268519  .00009878  00000-0  18200-3 0  5082'
line2 = '2 25544  51.6498 109.4756 0003572  55.9686 274.8005 15.49815350868473'
tle_strings = ('Foof', line1, line2)
tles = [PyTle(*tle_strings)]  # Note: could also be a scalar, propagate_many
doesn't care
result = propagate_many(mjds, tles)

# ---------------------------------------------------------------------------
# RuntimeError                              Traceback (most recent call last)
# <ipython-input-7-d395a60f817b> in <module>
# ----> 1 result = propagate_many(mjds, tles)
# cysgp4/cysgp4.pyx in cysgp4.cysgp4.propagate_many()
# 
# RuntimeError: Error: Satellite decayed

The exception is correct, because the TLE is quite outdated for the MJD that was provided:

tles[0].epoch
# <PyDateTime: 2014-01-20 22:23:04.000416 UTC>
tles[0].epoch.mjd
# 56677.93268519

Nevertheless, you hit a weak spot. I'll need to add some sanity checks and not just blindly passing the arrays into the C++ constructors. My apologies for the inconvenience.

The other questions will need some further thoughts, especially when it comes to optimizing the OpenMP loops. I'll get back to you.

Best, Benjamin

bwinkel commented 4 years ago

The documentation on PyPI thanks for providing a useful page with examples, by the way! does not include a dedicated link to the repository, which was a slight inconvenience. In the meantime I found it by following the link for Jupyter Notebooks, which is a link to GitHub.

There is a link to the "Homepage" (which is the repository) on PyPI. I may add some more direct links to the manual, GH issues etc. in the next release.

bwinkel commented 4 years ago

The mighty propagate_many() interface, which will be of the most interest to Skyfield folks, unfortunately does not cache the Satellite objects it creates but takes raw TLE lines that must be re-created each time propagate_many() is called. This could be a noticeable cost to folks needing to get the same array of satellites and do many propagations on them (for, say, animating a live globe).

Indeed, it could be worth-wile to try to further optimize the propagate_many. Originally, I reused a single Tle (C++) object as can be seen here, but I ran into serious issues. The problem is that in the underlying C++ code the SGP4 class needs to be instantiated (lots of private class members!) with the correct TLE, before one can safely use the find_position routine. When one does parallelization with arbitrary numpy broadcasting, it is hard to have different OpenMP workers figure out which is which. I'm not even sure, how much computing time is wasted by creating a new SGP4 object for each calculation. I'll try to benchmark this.

Note also, that propagate_many expects a list of PyTle objects, and if one uses numpy broadcasting in a clever way, one can already safe a lot of extra computations. For example, calculating the satellite positions of all active satellites over the course of a day (with high time resolution) only requires 2000 and some PyTle objects. Have a look at the notebooks, for this.

Could the library surface a native Satellite array-of-structs?

If it's technically possible, that'd be fine with me. Not sure, how to do it. Internally, in the C++ library, SGP4 is a Class and most of all the variables are in private structs. Will try, if this could be exposed via Cython.

bwinkel commented 4 years ago

All Skyfield needs is ECI coordinates. It would be magnificent if there were a way, after turning a list of TLE lines into a native array of Satellite structs, to run a high-performance loop that produced ECI coordiantes from those structs without any of the if-else logic surrounding topocentric computations.

I don't think, it would be worth the effort. Compared to the number-crunching in the SGP4lib, the few if statements are almost a no-op. (This is compiled code!)

bwinkel commented 4 years ago

The ideal interface would be (1) a function that takes TLE lines and returns an array of raw inlined SGP4 structs, and (2) a function that takes the structs and an array of MJD's and produces an array of positions.

This is exactly the reason, why I invented the propagate_many routine, which does almost that (I think), only that it's somewhat more high-level than raw C++ struct accesses. The only difference (as discussed above) is that an SGP4 instance is created for each position calculation. Do you have an idea how much processing time is involved in both steps? Naively, I've always assumed the position finding would be the bottleneck, but perhaps this was premature.

Again, in my original solution, I had the SGP4 instances created for each TLE before starting the loops, but that failed horribly in my multi-threaded approach, as different threads could potentially interfere with each other, if they operated on the same SGP4 instance. One could perhaps add a single-threaded function, that does this (would also be interesting for benchmarking), but on modern machines a lot of potential to improve the real runtimes lies in the parallelization, doesn't it?

brandon-rhodes commented 4 years ago

I'll need to add some sanity checks and not just blindly passing the arrays into the C++ constructors. My apologies for the inconvenience.

Thanks for looking at my example and figuring out what went wrong! Manual type checks are always a bit tedious, without any guarantee that every possible bad value has been checked.

There is a link to the "Homepage" (which is the repository) on PyPI.

Ah, understood. I was looking for something named "GitHub" or "Repository"; "Homepage" sounded like it went to a website or readthedocs page.

lots of private class members!

Yes, it's a bit startling how large the "satrec" struct is. I have not yet worked out why all those values need to survive between one call to propagate and the next, when it looks like whole stacks of values are also passed in the very long parameter lists.

if one uses numpy broadcasting in a clever way, one can already safe a lot of extra computations. For example, calculating the satellite positions of all active satellites over the course of a day (with high time resolution) only requires 2000 and some PyTle objects. Have a look at the notebooks, for this.

Wow, that's a lot of points to produce with only 2k objects! I'll take a look at those notebooks when I have time.

If it's technically possible, that'd be fine with me. Not sure, how to do it. … Will try, if this could be exposed via Cython.

Having spent an hour yesterday learning more about Cython, I recommend against trying — it looks like there's no native mechanism in Cython for "expose this struct via a wrapper", and instead it either has to be done by hand, or else another tool has to be used to generate Cython code. For your use case, it's likely that your current approach is just fine.

I don't think, it would be worth the effort. Compared to the number-crunching in the SGP4lib, the few if statements are almost a no-op.

You're right, the expense is a very small one. I think I was probably reacting to the aesthetics of calling a big function with lots of arguments that use if to turn n features on and off, instead of thinking accurately about how long the C language if gates would really take.

Do you have an idea how much processing time is involved in both steps?

No. I probably often respond to software design purely on appearance — "but this is extra work, could it possibly be avoided?" — instead of only reacting to things that when benchmarked truly take lots of extra time. For some reason, I like avoiding extra work even if it doesn't matter. :)

but on modern machines a lot of potential to improve the real runtimes lies in the parallelization, doesn't it?

Probably! I haven't had the chance to benchmark it both ways on my little laptop's processor to see what the difference is.

Just for fun, I'm going to look today at other options for a thin wrapper around SGP4 that might serve Skyfield without requiring you to shoe-horn extra calling modes or approaches into your nice library — which is an admirable one-step solution for folks wanting to observe satellites from a location! I expect that folks wanting to predict satellite passes will enjoy the fact that you're able to do all the steps of producing an altaz position from inside of a single C routine!

bwinkel commented 4 years ago

Just for fun, I'm going to look today at other options for a thin wrapper around SGP4 that might serve Skyfield without requiring you to shoe-horn extra calling modes or approaches into your nice library...

Feel free to contact me if you need help - at least in the case when you opt into Cython. It can be a bit tenacious, especially, when it comes to wheel building on all three major platforms. I'm also happy to add features into cysgp4 if there is demand.

brandon-rhodes commented 4 years ago

Well, I had somehow imaged this would take much less code, but I've just produced a 250-line wrapper around Vallado’s raw unaltered SGP4 C++ code that lets me do simple propagation with one satellite, or propagation of a whole array of satellites and times:

https://github.com/brandon-rhodes/python-sgp4/blob/master/extension/wrapper.cpp

To keep the C code as simple as possible, it relies on the calling Python code to build the NumPy arrays into which it will dump its output.

The result, alas, is slower than your code because it cannot take advantage of multiple cores! On my 4-core laptop, running:

wget http://celestrak.com/NORAD/elements/science.txt
pip install numpy cysgp4 . && python tmp.py

— shows your library taking only 0.012 seconds while mine takes an entire 0.015 seconds. Obviously, making the operation parallel is a big deal. But can I do it without taking on a huge and complicated dependency like Cython?

I'm going to step away from the problem for a day or so and ponder. I like the lightweight approach in my little wrapper module, for getting the numbers another library like Skyfield would need. But it does penalize users who would prefer a hefty parallel solution!

I'll have to decide whether to make mine more complicated, or stop here and recommend mine for accelerated single-core operation, and yours for situations where they have the capacity to install Cython on their system and want to use multiple cores.

bwinkel commented 4 years ago

Wow, that is amazing. For the record, I also ran the cysgp4 on one core (using the set_num_threads function), and on my machine the results are 14.7 ms vs 8.6 ms. So your implementation is indeed quite a bit faster. I'm not very familiar with SGP4 models - cysgp4 was just a little side project for me - how does the Vallado software compare to the sgp4lib by Daniel Warner? Perhaps the speed boost comes from the difference in the wrapped libraries? I will have a look into this. Perhaps, I can add a cythonized wrapper around Vallado, as well.

Cython's parallelization is not much else than providing a prange function, which can be used to replace the normal Python range. Internally this will be translated to an OpenMP loop. It should not be difficult to add OpenMP to your C extension, although it may produce some head-ache with building on MacOS. I'm also not sure about the GIL. Perhaps, one could also make use of numpys broadcast iterator, to be more flexible with the inputs. A while ago, I created a template package, which demonstrates how to use the new array iterator framework from numpy to combine broadcasting with the prange: https://github.com/bwinkel/cynpyiter/blob/master/cynpyiter/iterfunc.pyx. It should be possible to translate this into a C extension without too much work.

If you don't mind, my last 2 cents: In my experience, Cython is producing very little overhead. I also don't think of it as a heavy dependency. Of all packages that I know, which involve some kind of compiled code or need a compiler (e.g., numpy, scipy, numba), Cython is the only one, which I never had installation issues with. For this kind of software, one better provides wheels or conda packages anyway. Also, the wrapper code is much easier to understand if one is not very experienced with C extensions, like me (I admire people who are able to write these!).

bwinkel commented 4 years ago

PS: If one wanted to parallelize the Vallado code, a deep copy of the satrec (elsetrec) struct will be necessary in each thread. The sgp4 propagator function makes modifications to it during the calculation. Since it is a very simple C struct, a memcopy should suffice, which is reasonably fast.

brandon-rhodes commented 4 years ago

I'm not very familiar with SGP4 models - cysgp4 was just a little side project for me - how does the Vallado software compare to the sgp4lib by Daniel Warner?

Ah, good question! Vallado is the author of the official SGP4 implementation at:

https://www.celestrak.com/software/vallado-sw.php

I had not seen Wagner's implementation, and do not know its history. Reading through it, the variable names and so forth are different enough that it may be an independent implementation rather than just a reformatting of Vallado’s code — does Wagner provide a history of any sort?

Vallado updates his code every few years, so here's a repository where one of his fans has created a git history of the versions from the celestrak site:

https://github.com/rirze/sgp4-cpp/tree/master

It should not be difficult to add OpenMP to your C extension, although it may produce some head-ache with building on MacOS. I'm also not sure about the GIL. Perhaps, one could also make use of numpys broadcast iterator, to be more flexible with the inputs.

Thanks for the warning about MacOS! I'll also take a look at the numpy broadcast iterator; maybe I could provide something that numpy will accept as a ufunc and let it do the parallel work.

If you don't mind, my last 2 cents:

I don't mind at all, thanks for sharing so much of your experience with this code!

In my experience, Cython is producing very little overhead. I also don't think of it as a heavy dependency.

Yes, I should have been more precise with my wording: I meant simply that it’s an extra several megabytes on disk once the wheel is uncompressed, versus about 8k for the little wrapper I wrote yesterday, and since an unusual number of satellite tracking folks seem to do doing so on Raspberry Pi devices and other small computers, I have been trying to keep sgp4 as sleek as possible.

For this kind of software, one better provides wheels or conda packages anyway.

Agreed. Alas that I won't be able to distribute sgp4 any more as only a plain .tar.gz!

Also, the wrapper code is much easier to understand if one is not very experienced with C extensions, like me (I admire people who are able to write these!).

Good point, cython is easier to understand! But in this case I could find no equivalent for PyMemberDef, where a simple table of types and offsets automatically makes Python provide getters and (if desired) setters for every member of a struct. With Cython it looked like getters would all have to be written separately?

bwinkel commented 4 years ago

I had not seen Wagner's implementation, and do not know its history. Reading through it, the variable names and so forth are different enough that it may be an independent implementation rather than just a reformatting of Vallado’s code — does Wagner provide a history of any sort?

It seems that @dnwrnr is actively maintaining his sgp4 implementation. Perhaps, he could comment?

bwinkel commented 4 years ago

Good point, cython is easier to understand! But in this case I could find no equivalent for PyMemberDef, where a simple table of types and offsets automatically makes Python provide getters and (if desired) setters for every member of a struct. With Cython it looked like getters would all have to be written separately?

It should be possible, I think. After all, one can use all the CPython machinery in Cython, if one is willing to add some boilerplate code (if needed). See e.g. https://gist.github.com/jnothman/9170380, which does something similar?

bwinkel commented 4 years ago

Slightly off-topic: is there a license attached to Vallado's C++ library? I couldn't find one in the zip-file, nor in the repository.

brandon-rhodes commented 4 years ago

See e.g. https://gist.github.com/jnothman/9170380, which does something similar?

Interesting, I had not seen that gist. It seems to take a type that already exists and is defined, and loops over its members inspecting their names, types, and values. I think that what I need for this "elsetrec" struct, though, is the opposite: the ability to define a Python built-in type and create its "members" array? (The array that, once it existed, this gist would then iterate over.)

Slightly off-topic: is there a license attached to Vallado's C++ library? I couldn't find one in the zip-file, nor in the repository.

The only statement I have found is in the Frequently Asked Questions list at:

https://www.celestrak.com/publications/AIAA/2006-6753/faq.php

“There is no license associated with the code and you may use it for any purpose—personal or commercial—as you wish. We ask only that you include citations in your documentation and source code to show the source of the code and provide links to the main page, to facilitate communications regarding any questions on the theory or source code.”

dnwrnr commented 4 years ago

Most code is a copy of the original fortran code that is then modified into whatever language you are using. Some corrections are then applied to fix issues with the original code