AndrewAnnex / SpiceyPy

SpiceyPy: a Pythonic Wrapper for the SPICE Toolkit.
MIT License
384 stars 84 forks source link

[WIP] Cython spiceypy prototype #464

Open AndrewAnnex opened 1 year ago

AndrewAnnex commented 1 year ago

Prototype of Cython based interface to CSPICE. Links to the existing CSPICE shared library so is able to work along side the existing spiceypy interface allowing gradual development of features. Current focus will be to provide faster vectorized functions first and then consider what makes sense to cythonize from that point.

This is very much a work in process, as I am still learning cython best practices, and changes to python packaging may be outpacing cython's development and documentation.

Current speedups are modest for vectorized functions, about 1.5-1.7 times faster than pure python versions. This may be due to unavoidable overhead in certain portions such as dealing with strings.

This PR will probably be open for a while until enough of the issues around cross platform compilation, docstrings, and cython usage improvements are made.

suggestions are welcome!

codecov[bot] commented 1 year ago

Codecov Report

Merging #464 (e62a539) into main (bcf2a5a) will decrease coverage by 0.03%. The diff coverage is 98.16%.

@@            Coverage Diff             @@
##             main     #464      +/-   ##
==========================================
- Coverage   99.86%   99.84%   -0.03%     
==========================================
  Files          13       14       +1     
  Lines       15633    15836     +203     
==========================================
+ Hits        15612    15811     +199     
- Misses         21       25       +4     
Impacted Files Coverage Δ
src/spiceypy/tests/test_cyice.py 98.02% <98.02%> (ø)
src/spiceypy/__init__.py 100.00% <100.00%> (ø)
src/spiceypy/spiceypy.py 99.68% <100.00%> (ø)
src/spiceypy/tests/test_wrapper.py 100.00% <100.00%> (ø)
alfonsoSR commented 1 year ago

Hi Andrew!

I saw that you were working on a Cython version of SpiceyPy and I could not resist to try to play a little bit with it!

I had some installation issues and saw that you were struggling to make .pxd files work, so I created a new branch (cyice felt like the "main" branch of the Cython version, so I decided to avoid making direct changes to it) called cy-spice and tried to fix them.

Here are some of the modifications I made:

Shared library and header files

When I first tried to install the package with pip install -e . I realized that the header files of CSPICE were missing. Since these files are required by cyice.pyx and cyice.pxd, I modified the get_spice.py script so that it creates a new directory called cspice in src/spiceypy with the following structure:

cspice
    lib
        libcspice.so
    include
        f2c.h
        f2cMang.h
        .
        .
        .

Since the header files were already part of the temporary directory to which the script downloads CSPICE, the only thing I had to do was to copy CSPICE's include directory to src/spiceypy/cspice/include and update the setup.py and libspicehelper.py (which you might not need if you port the package to Cython) with the new location of the shared library.

NOTE: I realized that the script was failing to copy some of CSPICE's header files to our local directory. The issue seems to be caused by the fact that gunzip reads the content of the tar.Z file directly from stdin. I fixed this problem by creating a temporary tar.Z file and using it as input for gunzip when the operating system is macOS. It might be worth checking if this issue also happens in Linux systems.

PXD file and vectorized versions of functions

I rewrote the cyice.pyx file to allow both single-value and array inputs for the functions you had implemented and moved the declarations to a .pxd file.

I also made some small modifications to cyice's tests to pass the input using the parametrize decorator and ensured that all the regular SpiceyPy's tests are still passing.

NOTE: I've been reading Cython's documentation regarding conversion between C and Python strings and it seems that, with the c_string_type = unicode and c_string_encoding = utf-8 compiler directives, Python 3 strings are automatically casted to char * and vice-versa. There is a link to the documentation page in the cyice.pyx script.

Conclusion

I have some small improvements in mind and ideas to avoid potential issues, but they are easier to explain having the code at hand and I don't want to overwhelm you coming out of nowhere with an infinite list of suggestions.

I have some experience with Cython and I see this as a great chance to keep learning about it and open-source projects so, if you want, I would be happy to help you with cyice.

AndrewAnnex commented 1 year ago

Hey @alfonsoSR thanks for your interest and I appreciate the feedback. As this will be a long PR, I think it's best to just comment on the code inline in the PR with suggested changes for now. If I do merge it I don't plan to expand it much beyond the current functionality, so prs will be acceptable then to expand the number of supported functions. Currently really need to solve the packaging issues and ensuring I am writing the most performant reasonable code.

As for the various items you mentioned

  1. I could not reproduce the missing header issue on my system, on macos the header files are present and in the correct folder. Maybe post the details about your system (os version, cpu architecture, etc)? But in any case the cythonized extension wasn't able to work with the pxd file in an early attempt. I should try again now that more of the compilation side is working.
  2. I am aware of the unicode string conversion for python 3 in cython, earlier versions of the branch used that directive. However, cspice itself does not support unicode, so I elect for the moment to use bytes that cspice expects. I don't think either way it's a big issue yet and something I can change back to once I'm happier with the overal code
  3. I wasn't aiming to implement the single-function call versions just yet, I do think it's something to add eventually but I'd organize it differently than you have them to define the private functions first and then declare the single call. I am also curious to see how the cython complier is able to optimize these functions vs just providing the single call and vectorized call individually without the wrapper.
  4. I really am not a fan of the pytest parameterize syntax in general as a stylistic choice. I think you were missing the pytest-benchmark dependency also, but thats just an opinion really and not consequential to making progress here yet. The benchmark part is something I want to keep for the moment to help demonstrate the value of the changes but will probably be moved to an external test script later.

For now I think keep commenting on my code as I develop it, I should be able to add credit to you if/when this gets merged to recognize your contributions. Currently there are still many issues with the compilation side and changes introduced by recent peps for pyproject.toml and using pypa build.

One question you could answer for me though is that I saw usage like this in your branch:

    cdef cnp.ndarray[dtype=cnp.double_t, ndim=2] ptarg_v = np.zeros((n, 3))
    cdef double[:, ::1] _ptarg_v = ptarg_v

when from what I understand about memoryviews in cython this should be equivalent:

    cdef double[:,:] ptargs = np.zeros((n,3), dtype=np.double)

as I've already cimport-ed numpy

alfonsoSR commented 1 year ago

Hi @AndrewAnnex, thanks for the answer!

I agree with what you say about inline comments. I created a new branch because I wanted to understand how the installation scripts work and to test my ideas before making any suggestions. Now that I am up-to-date with what you've done, I'll follow your commits and add comments with suggestions.

Regarding the rest of your comments:

Header files

I am working with Python 3.11 and macOS Ventura 13.3.1 on a 2020 MacBook Pro with Intel processor, so Darwin system with x86_64 architecture.

The issue with missing headers does not affect your branch nor mine, so I am not sure about how did you try to reproduce it. Just to be sure, you can reproduce the issue as follows:

  1. In my branch, replace the _unpack() method of the GetCSPICE class with its original version --which I include here.
    def _unpack(self):
        """Unpacks the CSPICE package on the given root directory. Note that
        Package could either be the zipfile.ZipFile class for Windows platforms
        or tarfile.TarFile for other platforms.
        """

        if self._ext == "zip":
            with ZipFile(self._local, "r") as archive:
                archive.extractall(self._root)
        else:
            cmd = (
                f"gunzip | tar {'xfC -' if host_OS == 'FreeBSD' else 'xC'}"
                f" {self._root}"
            )
            proc = subprocess.Popen(cmd, shell=True, stdin=subprocess.PIPE)
            proc.stdin.write(self._local.read())
        self._local.close()
  1. Delete the src/spiceypy/cspice directory if it exists and execute the get_spice.py script, which should not give any problems.
  2. Now execute python setup.py build_ext -if to compile cyice.pyx. You should get a fatal error because the header file SpiceFrm.h is missing from src/spiceypy/cspice/include.

PXD file

I first tried to just copy-paste the declarations of the original cyice.pyx script to cyice.pxd and the program did not work. I then decided to rewrite the functions one by one and to add the declarations to a fresh .pxd file as they were required. I did not face any issues during the process.

Memory view

You are right about the fact that you can directly assign the NumPy array to the memview, but since you want to return ptargs, you will need to use np.asarray to turn it into a NumPy array.

From my personal experience, binding the memview to the buffer of a pre-existing array is more efficient than using np.asarray, which can become pretty slow as the size of the array increases.

On a different note, just in case you don't know it, by using cdef double[:, ::1] instead of cdef double[:, :], you are specifying that the memview is C-contiguous. From O'Reilly's book: Cython, a guide for Python programmers,

When possible, it is advantageous from a performance standpoint to declare arrays as C or Fortran contiguous, as this enables Cython to generate faster code that does not have to take strided access into account.

Since NumPy arrays are C-contiguous by default, declaring multi-dimensional memviews as C-contiguous can make the code more efficient.

Rest

I wasn't aware that cspice does not support unicode. I didn't have any issues by directly passing Python strings to cspice functions, but I agree that it is not particularly important.

The rest of the things you mention are a matter of style and the code is yours, so I've got nothing to say about that :).

AndrewAnnex commented 1 year ago

@alfonsoSR okay so the missing header files are because you are not specifying an environment variable check I added to get_spice.py. Try setting "export CSPICE_NO_TEMP='True'" in the terminal first and you should see it build fine. This is because by default get_spice creates a temporary directory to build cspice and in order for it to work correctly currently, as you noticed, you need to have a src/cspice directory for cython/build_ext to have the headers in the correct directory.

Most cython projects assume for the most part that the library source in included in the python project or available system-wide so get_spice does a lot of work to make a temporary directory to do the compilation.