Tudat / tudat

NOTE: This Tudat version is no longer supported. See https://docs.tudat.space/en/stable/ and https://github.com/tudat-team/tudat-bundle for the new version
BSD 3-Clause "New" or "Revised" License
87 stars 143 forks source link

SPICE(NOMOREROOM) errors #635

Closed haji-ali closed 4 years ago

haji-ali commented 4 years ago

Hello,

I am trying to run a Monte Carlo simulation of SingleSatellitePropagator. I modified the code a little to run a completely independent computation for different initial conditions of position and velocity (I know I can reuse parts of the computations, but I want to start simple). After every single computation, I make sure that all objects that were allocated by tudat are freed (as they should be when objects and shared_pointers are deleted from the stack).

After running more than 5300 calculations, I get the error


Toolkit version: N0066                                                                                                                                                                                            

SPICE(NOMOREROOM) --                                                                                                                                                                                              

There is no room left in KEEPER to load another SPICE kernel. The current limit                                                                                                                                   
on the number of files that can be loaded is 5300. If you really need more than                                                                                                                                   
this many files, you should increase the parameter MAXFIL in the subroutine                                                                                                                                       
KEEPER.                                                                                                                                                                                                           

A traceback follows.  The name of the highest level module is first.                                                                                                                                              
furnsh_c --> FURNSH                                                                                                                                                                                               

Oh, by the way:  The SPICELIB error handling actions are USER-TAILORABLE.  You                                                                                                                                    
can choose whether the Toolkit aborts or continues when errors occur, which                                                                                                                                       
error messages to output, and where to send the output.  Please read the ERROR                                                                                                                                    
"Required Reading" file, or see the routines ERRACT, ERRDEV, and ERRPRT.  

I am not sure which why files are lingering from SPICE after all tudat object are freed.

Possibly related, running valgrind on application_SingleSatellitePropagator, I get the following leak

==9959== HEAP SUMMARY:
==9959==     in use at exit: 340,662 bytes in 63 blocks
==9959==   total heap usage: 514,909 allocs, 514,846 frees, 28,302,468 bytes allocated
==9959==
==9959== 339,975 (912 direct, 339,063 indirect) bytes in 1 blocks are definitely lost in loss record 61 of 61                                                                                                     
==9959==    at 0x70AC593: operator new(unsigned long) (vg_replace_malloc.c:344)
==9959==    by 0x6E3FF1: tudat::simulation_setup::createBodies(std::map<std::string, std::shared_ptr<tudat::simulation_setup::BodySettings>, std::less<std::string>, std::allocator<std::pair<std::string const, std::shared_ptr<tudat::simulation_setup::BodySettings> > > > const&) (in /home/ah180/Work/Projects/tudatBundle/tudatExampleApplications/satellitePropagatorExamples/bin/applications/application_SingleSatellitePropagator)
==9959==    by 0x6015A6: main (in /home/ah180/Work/Projects/tudatBundle/tudatExampleApplications/satellitePropagatorExamples/bin/applications/application_SingleSatellitePropagator)                              
==9959==
==9959== LEAK SUMMARY:
==9959==    definitely lost: 912 bytes in 1 blocks
==9959==    indirectly lost: 339,063 bytes in 60 blocks
==9959==      possibly lost: 0 bytes in 0 blocks
==9959==    still reachable: 687 bytes in 2 blocks
==9959==         suppressed: 0 bytes in 0 blocks
==9959== Reachable blocks (those to which a pointer was found) are not shown.
==9959== To see them, rerun with: --leak-check=full --show-leak-kinds=all

Both of these problems are preventing me from running many Monte Carlo simulations.

Any help is appreciated.

PS: Here is some simple code that shows this issue

``` #include #include extern "C" unsigned int GetOrbit(double* init, unsigned int bodies, double simulationEndEpoch, unsigned int N, double* output, bool debug) { using namespace tudat; using namespace tudat::simulation_setup; using namespace tudat::propagators; using namespace tudat::numerical_integrators; using namespace tudat::orbital_element_conversions; using namespace tudat::basic_mathematics; using namespace tudat::unit_conversions; // Load Spice kernels. spice_interface::loadStandardSpiceKernels( ); // Create body objects. std::vector< std::string > bodiesToCreate; bodiesToCreate.push_back( "Earth" ); std::map< std::string, std::shared_ptr< BodySettings > > bodySettings = getDefaultBodySettings( bodiesToCreate ); bodySettings[ "Earth" ]->ephemerisSettings = std::make_shared< ConstantEphemerisSettings >( Eigen::Vector6d::Zero( ) ); // Create Earth object NamedBodyMap bodyMap = createBodies( bodySettings ); std::vector< std::string > bodiesToPropagate; std::vector< std::string > centralBodies; SelectedAccelerationMap accelerationMap; // Define propagation settings. std::map< std::string, std::vector< std::shared_ptr< AccelerationSettings > > > accelerationsOfAsterix; accelerationsOfAsterix[ "Earth" ].push_back( std::make_shared< AccelerationSettings >( basic_astrodynamics::central_gravity ) ); Eigen::VectorXd systemInitialState(6 * bodies); // Create spacecraft object. for (unsigned int i=0;i( ); bodiesToPropagate.push_back(ss.str()); accelerationMap[ ss.str() ] = accelerationsOfAsterix; int j = 6*i; systemInitialState(j+xCartesianPositionIndex) = init[j]; systemInitialState(j+yCartesianPositionIndex) = init[j+1]; systemInitialState(j+zCartesianPositionIndex) = init[j+2]; systemInitialState(j+xCartesianVelocityIndex) = init[j+3]; systemInitialState(j+yCartesianVelocityIndex) = init[j+4]; systemInitialState(j+zCartesianVelocityIndex) = init[j+5]; } // Finalize body creation. setGlobalFrameBodyEphemerides( bodyMap, "SSB", "ECLIPJ2000" ); // Create propagator settings. std::shared_ptr< TranslationalStatePropagatorSettings< double > > propagatorSettings = std::make_shared< TranslationalStatePropagatorSettings< double > > ( centralBodies, accelerationModelMap, bodiesToPropagate, systemInitialState, simulationEndEpoch ); // Create numerical integrator settings. double simulationStartEpoch = 0.0; const double fixedStepSize = simulationEndEpoch / N; std::shared_ptr< IntegratorSettings< > > integratorSettings = std::make_shared< IntegratorSettings< > > ( rungeKutta4, simulationStartEpoch, fixedStepSize ); // Create simulation object and propagate dynamics. SingleArcDynamicsSimulator< > dynamicsSimulator( bodyMap, integratorSettings, propagatorSettings ); std::map< double, Eigen::VectorXd > integrationResult = dynamicsSimulator.getEquationsOfMotionNumericalSolution( ); unsigned int count = 0; const unsigned int total_count = bodies*(N+1)*6; for (auto itr=integrationResult.begin(); itr != integrationResult.end();++itr){ const Eigen::VectorXd &res = itr->second; for (int i=0; i < res.size();++i,++count) if (output && count < total_count) output[count] = res[i]; } return count; } int main( ) { double init[] = {-33552459.274056, -23728303.048015, 0.0, -1828.997179397, 2534.1074695609, 0.0}; for (int i=0;i<2000;i++){ GetOrbit(init, 1, 20, 1, NULL, false); } } ``` This crashes at object 1324 on my machine.
DominicDirkx commented 4 years ago

Hi Abdul,

The problem is that you are calling loadStandardSpiceKernels inside your loop. The Spice Kernels that are loaded when calling this function are not unloaded at the end of the GetOrbit function, and at some point Spice complains that too many kernels are loaded. If you move the loadStandardSpiceKernels function call to your main function, before the for loop, this issue should go away,

Best,

Dominic

haji-ali commented 4 years ago

Hello Dominic,

Thanks! In my actual code, I do the computations in separate threads. I was trying to avoid the thread safety issues of #541 by loading separate kernels for every computation. I am worried that loading a single kernel for all computations (done in parallel) could lead to problems.

As far as I can see, there is no way to "unload" the spice kernels in tudat to avoid NOMOREROOM issues, is that correct?

Also, I now realize that this is a separate issue, the valgrind output shows a memory leak is from an allocation in tudat::simulation_setup::createBodies.

DominicDirkx commented 4 years ago

Hi Abdul,

Indeed, thread safety is an issue. Spice is inherently thread-unsafe, and there is a small (?) memory leak from 'createBodies'.

The best way to parallelize is:

Best,

Dominic

haji-ali commented 4 years ago

I see. Thanks! I am guessing that the names in bodyMap have to be unique across all threads.

I also found clearSpiceKernels which frees up the spice kernels.

DominicDirkx commented 4 years ago

What do mean exactly with:

I am guessing that the names in bodyMap have to be unique across all threads.

You can have a separate "Earth" body in each thread (and probably will)

haji-ali commented 4 years ago

I meant the bodies inside bodyMap like Asterix. I am not sure what the internal implementation is, but the name of the object seems to be used as a way to reference that object. I assume if I have several threads doing similar computation, I would have to have an Earth body in each, but the names of the bodies to propagate should be different.

DominicDirkx commented 4 years ago

If you're multithreading, you're going to have to create a completely separate and independent body map for each thread. So, you must have N body maps, each of which will probably start out identical (with an Earth, Asterix, etc. entry)