AndrewAnnex / SpiceyPy

SpiceyPy: a Pythonic Wrapper for the SPICE Toolkit.
MIT License
385 stars 84 forks source link

Handle Multithread Requests #410

Open TommasoPino opened 3 years ago

TommasoPino commented 3 years ago

In order to safely use the spice library in a multithread pool process, it is necessary to lock access to the spice resource.

This is tested with 10000 calls with 16 concurrent threads.

TommasoPino commented 3 years ago

First of all thanks for the fast replay and the suggestions. I will produce an example file to test the current version as you requested. I prefer the usage of a native package to respect an external one in order to keep the dependency to the minimum. But, I will study it too

Thanks for the suggestions. I will come back when the modification status will be a more mature state. Thanks for your time

jdiazdelrio commented 3 years ago

Hi @TommasoPino and @AndrewAnnex. I'm curious about the outcome of this activity :)

The CSPICE library is not thread-safe in itself. For example, CSPICE would not work on a multi-threaded server where the CSPICE library is dynamically loaded and shared among all threads. Imagine a use case where two users are loading/unloading kernels in different threads started at exactly the same time (T), following this sequence:

If this is implemented using CSPICE library using the SPICE Standard error handling settings, thread 1 will produce an error at T+20s upon calling str2et since there will be no kernels loaded at the time user 1 is trying to convert from UTC to ET, and terminate the complete application (both threads and main).

TommasoPino commented 3 years ago

Hi @jdiazdelrio, the usage I meant for this modification is not for a server-like environment but for guidances database generation that required for different geometries the invocation of the kernels. The guidances are independent of each other then a concurrent computation could be done. Spawn different processes that load a new instance of the spice library could be a solution but cost a lot in performances and retrieving the separated result is a pain. I found in multithreading a good comprimise. Obviously, the example you pointed out required a completely different approach.

To answer the @AndrewAnnex's questions I prepared an example file that uses the current 4.0.0 version of and testes the functionalities for multithreading showing the issues in spkezr routine.

For b1900 the issue is not evident because it reads a variable without modifying it.

I found in this link an appropriate way to disable the decorator in order to allow some user that has a personal solution for multithreading calls to disable it in their environments (or enable if we decide to disable it by default).

pep8speaks commented 3 years ago

Hello @TommasoPino! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 236:1: E302 expected 2 blank lines, found 1 Line 203:74: W291 trailing whitespace Line 201:80: W291 trailing whitespace Line 189:84: W291 trailing whitespace Line 187:81: W291 trailing whitespace Line 184:1: E302 expected 2 blank lines, found 1 Line 99:1: W293 blank line contains whitespace Line 83:1: E302 expected 2 blank lines, found 1

Line 5979:1: W293 blank line contains whitespace Line 89:1: E302 expected 2 blank lines, found 1 Line 87:20: E712 comparison to True should be 'if cond is True:' or 'if cond:' Line 86:21: E712 comparison to False should be 'if cond is False:' or 'if not cond:' Line 81:1: E302 expected 2 blank lines, found 1 Line 72:29: E226 missing whitespace around arithmetic operator Line 70:24: E226 missing whitespace around arithmetic operator Line 65:1: E302 expected 2 blank lines, found 1 Line 60:93: E231 missing whitespace after ',' Line 58:32: E251 unexpected spaces around keyword / parameter equals Line 58:30: E251 unexpected spaces around keyword / parameter equals Line 57:29: E226 missing whitespace around arithmetic operator Line 55:24: E226 missing whitespace around arithmetic operator Line 45:1: E302 expected 2 blank lines, found 1

Comment last updated at 2021-11-27 10:44:23 UTC
AndrewAnnex commented 3 years ago

@jdiazdelrio Good points on what SPICE can and can't do. My understanding was that this PR wouldn't actually solve the problem of that exact scenario, but would make it safer to say have multiple threads reading from the kernel pool, same as your example, minus user 2 clearing the kernel pool. For multiple users, multiple processes reading and writing, they would need to maintain separate kernel pools as I understand it. However, maybe there is nothing unsafe with multithreading reads from the kernel pool (ie multiple threads calling spkezr, with a main thread maintaining the kernel pool), in which case that may negate the need of this PR.

do you have more thoughts on whether this would be useful?

jessemapel commented 3 years ago

We've seen SPICE errors from concurrent kernel pool reads using bodvar_c, so I don't think it's safe to just lock kernel loading and unloading.

TommasoPino commented 3 years ago

Could you provide a working example to test the case and adjust the PR? Thanks in advance.

Tommaso

Il giorno 6 mag 2021, alle ore 22:12, Jesse Mapel @.***> ha scritto:

 We've seen SPICE errors from concurrent kernel pool reads using bodvar_c, so I don't think it's safe to just lock kernel loading and unloading.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

jdiazdelrio commented 3 years ago

@AndrewAnnex, when having concurrency accessing the kernel pool, it's important to know the state of the kernel pool itself. There are some parameters and features of the kernel pool the will be shared among all threads and that may have an impact on status of the loaded data, the status of the kernel pool itself, and even the performance. As an example using spkezr, the SPICE SPK system (actually, the underlying DAF subsystem) does buffering in order to improve performance: as records are read from the DAF files, they are saved in an internal buffer maintained by the DAF subsystem. If any part of that record is needed in order to compute a state, the record is returned without accessing the file. If two concurrent threads "compete" for data in the same file but in different records of that file, the final outcome might be way slower than accessing the data sequentially. Note that due to the nature of the the SPK subsystem, two calls that may look independent, e.g. the state of Phobos w.r.t to Mars and the position of the Earth w.r.t. the Sun, may both require to know the location of the Solar System Barycenter (e.g. if aberration corrections are applied). Similar issues may happen when reading CK, binary PCK or DSKs. Another case is accessing DLA or DAF subsystems directly.

Kernel priority is important as well. Since the kernel pool is shared among all threads, any modification to the kernel pool (adding a new kernel, or manually adding new data using "put-pool" routines pcpool, pipool or pdpool) may lead to unexpected results for those threads that do not expect such modifications.

The error subsystem is something to look into as well. If I'm not mistaken, SpiceyPy uses the underlying CSPICE error subsystem in "RETURN" mode, which means that until reset is called, all non "Error free." calls to CSPICE will return immediately, with undefined results, independently of which thread is calling them.

Any call to dskstl will have an impact on all threads, no matter if its locked or not, as it changes the DSK tolerance for all subsequent calls to the DSK subsystem.

AndrewAnnex commented 3 years ago

@jdiazdelrio thanks for the details. I think, unless I am mistaken, that this addition would actually help address some, but not all, of the situations you describe, although it does not/cannot address the underlying issues with CSPICE, obviously. As I wrote the following, I think there is a bit of a confusion between multi-threading and multi-processing in the comments above (including mine). In short, this PR could help some situations relating to multi-threading but does not/cannot address the larger issue of side-effect-free/deterministic use of spice with multiple process/threads.

For an example, for the error subsystem, only a single thread at a time would be allowed to access spice, so if an error occurs in one thread, the call to reset would occur and safely return because of the blocks for the other threads. Same with the example of spkezr, as each thread would be blocked from calling spekzr until the lock is lifted. This should then in effect enforce the serial access aspect, as only one thread at a time would be able to interact with spice. It could be that a lock is not actually appropriate to ensure this, and that instead a Semaphore of 1 should be used instead (equivalent to a mutex iirc), but I will need to think about it more/get some examples coded.

I think the kernel priority/kernel pool modification bit remains unsolved, and can not be addressed by any small code contribution. If I write a variable to the pool from one thread, delete it in another, and expect to read it from a third that wouldn't work even with the locks. Currently however, spiceypy provides no guard rails of any kind, so perhaps this PR should continue forward but with a different "scope" to just address the 1-at-a-time aspect of this question.

For dskstl, yes I think that and other functions that influence the global state this would be the case, but solving that problem would require using multiple processes in some way to maintain independent spice libraries. That is a bigger/different problem/question than the multithreading question that this PR is trying to help.

Is that how you understand it?

jdiazdelrio commented 3 years ago

@AndrewAnnex, your answer goes in line with my understanding.

But my feeling, in aspects like spkezr, is that a user solution based on multiple threads to retrieve state data will be slower than a single-threaded one, because of the likely need of underlying buffering. Note that some high-level SPICE APIs feed from similar data: SPK data is needed for some frame transformations if these need to have aberration corrections; SCLK data is needed for CK access; CK and PCK data might be needed to lookup states on body-fixed or spacecraft-based reference frames... this would mean that this solution, for SPICE intensive applications, might slow down the overall process.

Regarding kernel pool modification, users should think about modification in the sense of reassigning a variable to a new value, or extending/reducing the length of an array within the kernel pool. Therefore, this solution does not help with kernel-pool or CSPICE global state modifications, as you point out.

My guess is that this solution is a good safeguard for non-SPICE-intensive multi-thread applications, where the core of the application can be divided among multiple threads, and each of them can occasionally perform read operations on the SPICE system.

As I said, I'm really curious about the outcome of this activity. If it works it'd be a great contribution to SpiceyPy, even if it's only for some uses (in which case, I'd document very well what can and cannot be done when using SpiceyPy in a multi-threaded environment).

jdiazdelrio commented 3 years ago

@TommasoPino, is there a way to prohibit certain functions, e.g. furnsh, to be called from within a multi-thread environment?

TommasoPino commented 3 years ago

@TommasoPino, is there a way to prohibit certain functions, e.g. furnsh, to be called from within a multi-thread environment?

Hi @jdiazdelrio , the only way to prevent accessing furnsh or other function that modifies the kernel pool from child threads is to memorize the name of the father thread and giving access to some specific functions only to it.

TommasoPino commented 3 years ago

Hi @AndrewAnnex , I added some modifications and the test case. I am not familiar with pytest, I am not sure about the test itself, please check it. Hope the PR is now enough mature, but any other comments are well accepted.

TommasoPino commented 3 years ago

Hi @AndrewAnnex, I committed the required modifications for testing and the corrections for CI. Could you please approve the running workflow? Thanks

codecov[bot] commented 3 years ago

Codecov Report

Merging #410 (7aeee8d) into main (dca3b8a) will decrease coverage by 0.03%. The diff coverage is 100.00%.

:exclamation: Current head 7aeee8d differs from pull request most recent head 3654fad. Consider uploading reports for the commit 3654fad to get more accurate results Impacted file tree graph

@@            Coverage Diff             @@
##             main     #410      +/-   ##
==========================================
- Coverage   99.88%   99.84%   -0.04%     
==========================================
  Files          12       12              
  Lines       15206    15835     +629     
==========================================
+ Hits        15188    15810     +622     
- Misses         18       25       +7     
Impacted Files Coverage Δ
spiceypy/spiceypy.py 99.60% <ø> (-0.09%) :arrow_down:
spiceypy/config.py 100.00% <100.00%> (ø)
spiceypy/tests/test_wrapper.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update dca3b8a...3654fad. Read the comment docs.

AndrewAnnex commented 3 years ago

apologies @TommasoPino, I've been preoccupied. This PR has improved greatly since it's initial posting! However, there appears to be some test failures that need to be investigated. I also would like to see short/small tests that demonstrate the usage of the context managers to provide code coverage of those codes (see codecov CI report). See the other context manager tests to see an example of this.

I also think that this feature needs a dedicated documentation page as @jdiazdelrio suggests to clearly explain situations this can/cannot be used for and demonstrate usage of the context managers.

In any case, I plan to do a release of spiceypy soon to capture small changes since the last release, this PR won't be merged until after that release to keep this change isolated given the breadth of the changes.

AndrewAnnex commented 3 years ago

@TommasoPino I think that the build failures were due to changes I made to the test code, and that if we re-run the tests they should all pass, although I can't seem to restart them from within GitHub's ui. If you push any commit (can be empty) it should trigger a rebuild.

TommasoPino commented 3 years ago

@AndrewAnnex a new commit has been pushed. Waiting for the approval for the running workflow. Thanks

AndrewAnnex commented 3 years ago

@TommasoPino yay the tests pass, I left some additional comments that are all pretty simple additions/edits to improve the code coverage. I still think a short documentation section is needed, something like https://github.com/AndrewAnnex/SpiceyPy/blob/main/docs/exceptions.rst#not-found-errors as a new '.rst' file dedicated to this addition. A short explanation derived from the discussions about thread vs process safety and spice would also go a long way. I think you could just contribute a first pass version of this and I could improve it as needed in later commits.

michaelaye commented 2 years ago

As an example using spkezr, the SPICE SPK system (actually, the underlying DAF subsystem) does buffering in order to improve performance: as records are read from the DAF files, they are saved in an internal buffer maintained by the DAF subsystem.

@jdiazdelrio This sounds like even using Python's multiprocessing that doesn't use threads, it's still impossible to isolate the kernel pool properly, because it still uses the same installed CSPICE library underneath, correct? So, the only way to make this safe is to actually use multiple CSPICE installations, IIUC?