amusecode / amuse

Astrophysical Multipurpose Software Environment. This is the main repository for AMUSE
http://www.amusecode.org
Apache License 2.0
154 stars 98 forks source link

Brutus worker dies #579

Open GFTwrt opened 4 years ago

GFTwrt commented 4 years ago

Hello,

I get the following message: amuse.support.exceptions.CodeException: Exception when calling function 'evolve_model', of code 'BrutusInterface', exception was 'Error in code: no error message - code probably died, sorry.'

Condition:

Thank you fo r the support.

rieder commented 4 years ago

@tjardaboekholt could you have a look at this?

tjardaboekholt commented 4 years ago

Hi, thanks for the error message concerning Brutus.

I managed to do some runs with e=1e-20, and say, Lw=128 and dt_param=0.10, and the code did not crash. If I reduce Lw to 40 bits, then it crashes and gives the same error message you quoted. This is because the number of bits was too low to reach convergence of e=1e-20. If Brutus fails to reach a converged solution within the maximum number of iterations, it will give up and this causes the code to stop. However, for suitable combinations of (e, Lw, dt_param), the code should in principle work fine, i.e. make sure you have enough bits to resolve e.

Cheers!

GFTwrt commented 4 years ago

Hello,

I identified the values you've written as: gravity.set_bs_tolerance_string("1e-20") # as your "e" gravity.set_word_length(130) #as your Lw gravity.set_eta(0.01) #as your dt_param Are this the wrong parameters? It is not working.

GFTwrt commented 4 years ago

Hello,

here the build.log. Maybe there is missing something see warnings Building code: brutus, target: all, in directory: src/amuse/community/brutus


make[1]: Verzeichnis „/home/pi/amuse/src/amuse/community/brutus“ wird betreten mpicxx -g -O2 -fPIC -std=c++0x -I../mpfrc++ -I/home/pi/amuse/lib/stopcond -Impfrc++ -I./src -c -o interface.o interface.cc In file included from ./src/Star.h:6, from ./src/Brutus.h:1, from interface.cc:12: mpfrc++/mpreal.h: In function ‘const mpfr::mpreal mpfr::root(const mpfr::mpreal&, long unsigned int, mpfr_rnd_t)’: mpfrc++/mpreal.h:2201:50: warning: ‘int mpfr_root(mpfr_ptr, mpfr_srcptr, long unsigned int, mpfr_rnd_t)’ is deprecated [-Wdeprecated-declarations] mpfr_root(y.mpfr_ptr(), x.mpfr_srcptr(), k, r); ^ In file included from mpfrc++/mpreal.h:121, from ./src/Star.h:6, from ./src/Brutus.h:1, from interface.cc:12: /usr/include/mpfr.h:693:21: note: declared here MPFR_DECLSPEC int mpfr_root (mpfr_ptr, mpfr_srcptr, unsigned long, ^~~~~ In file included from ./src/Star.h:6, from ./src/Brutus.h:1, from interface.cc:12: mpfrc++/mpreal.h:2201:50: warning: ‘int mpfr_root(mpfr_ptr, mpfr_srcptr, long unsigned int, mpfr_rnd_t)’ is deprecated [-Wdeprecated-declarations] mpfr_root(y.mpfr_ptr(), x.mpfr_srcptr(), k, r); ^ In file included from mpfrc++/mpreal.h:121, from ./src/Star.h:6, from ./src/Brutus.h:1, from interface.cc:12: /usr/include/mpfr.h:693:21: note: declared here MPFR_DECLSPEC int mpfr_root (mpfr_ptr, mpfr_srcptr, unsigned long, ^~~~~ In file included from ./src/Star.h:6, from ./src/Brutus.h:1, from interface.cc:12: mpfrc++/mpreal.h: In function ‘const mpfr::mpreal mpfr::grandom(gmp_randstate_struct (&)[1], mpfr_rnd_t)’: mpfrc++/mpreal.h:2646:53: warning: ‘int mpfr_grandom(mpfr_ptr, mpfr_ptr, __gmp_randstate_struct*, mpfr_rnd_t)’ is deprecated [-Wdeprecated-declarations] mpfr_grandom(x.mpfr_ptr(), NULL, state, rnd_mode); ^ In file included from mpfrc++/mpreal.h:121, from ./src/Star.h:6, from ./src/Brutus.h:1, from interface.cc:12: /usr/include/mpfr.h:502:21: note: declared here MPFR_DECLSPEC int mpfr_grandom (mpfr_ptr, mpfr_ptr, gmp_randstate_t, ^~~~ In file included from ./src/Star.h:6, from ./src/Brutus.h:1, from interface.cc:12: mpfrc++/mpreal.h:2646:53: warning: ‘int mpfr_grandom(mpfr_ptr, mpfr_ptr, __gmp_randstate_struct*, mpfr_rnd_t)’ is deprecated [-Wdeprecated-declarations] mpfr_grandom(x.mpfr_ptr(), NULL, state, rnd_mode); ^ In file included from mpfrc++/mpreal.h:121, from ./src/Star.h:6, from ./src/Brutus.h:1, from interface.cc:12: /usr/include/mpfr.h:502:21: note: declared here __MPFR_DECLSPEC int mpfr_grandom (mpfr_ptr, mpfr_ptr, gmp_randstate_t, ^~~~ mpicxx -g -O2 -fPIC -std=c++0x -I../mpfrc++ -I/home/pi/amuse/lib/stopcond -I./src worker_code.cc src/libbrutus.a interface.o -o brutus_worker -L./src -lbrutus -L/home/pi/amuse/lib/stopcond -lstopcond -lmpfr -lgmp -lgmp make[1]: Verzeichnis „/home/pi/amuse/src/amuse/community/brutus“ wird verlassen

GFTwrt commented 4 years ago

Hello http://gcc.gnu.org/onlinedocs/gcc-3.4.6/gcc/Deprecated-Features.html tells that you use functionality which is no longer supported. Can you update the code to state of the art syntax.

tjardaboekholt commented 4 years ago

Hello,

I identified the values you've written as: gravity.set_bs_tolerance_string("1e-20") # as your "e" gravity.set_word_length(130) #as your Lw gravity.set_eta(0.01) #as your dt_param Are this the wrong parameters? It is not working.

Yes that is correct. Another way to set the parameters is:

    code = Brutus()
    code.parameters.bs_tolerance = "1e-20"
    code.parameters.word_length = 128
    code.parameters.dt_param = 0.10
    print(code.parameters) # to check values are correctly set
rieder commented 4 years ago

Just to add to that: the latter way @tjardaboekholt mentioned is the preferred method.

GFTwrt commented 4 years ago

Hello,

thank you for showing me the preferred usage. But this has no effect. The problem is the "deprecated" in the build.log. Please have a look to the result according your preferred method: begin_time: 0.0 s default: 0.0 s brutus_output_directory: /home/tst/amuse/data/brutus/output/ default: ./ bs_tolerance: 1e-20 default: 1e-08 dt_param: 0.1 default: 0.24 stopping_condition_maximum_density: 2.55293255306e+306 m-3 * kg default: -0.0142011587158 m*-3 kg stopping_condition_maximum_internal_energy: inf m2 s-2 default: -2558461176.91 m2 s-2 stopping_condition_minimum_density: -0.0142011587158 m-3 * kg default: -0.0142011587158 m-3 * kg stopping_condition_minimum_internal_energy: -2558461176.91 m*2 s-2 default: -2558461176.91 m*2 s**-2 stopping_conditions_number_of_steps: 1 default: 1.0 stopping_conditions_out_of_box_size: 0.0 m default: 0.0 m stopping_conditions_out_of_box_use_center_of_mass: 0 default: False stopping_conditions_timeout: 4.0 s default: 4.0 s timestep: 102715.479587 s default: 719008.357111 s word_length: 128 default: 72

0.0 s /home/pi/amuse/src/amuse/units/generic_unit_converter.py:189: RuntimeWarning: overflow encountered in double_scalars return new_quantity(number * factor, new_unit) Traceback (most recent call last):

File "", line 1, in runfile('/home/pi/tests/solar1.py', wdir='/home/pi/tests')

File "/usr/lib/python3/dist-packages/spyder_kernels/customize/spydercustomize.py", line 678, in runfile execfile(filename, namespace)

File "/usr/lib/python3/dist-packages/spyder_kernels/customize/spydercustomize.py", line 106, in execfile exec(compile(f.read(), filename, 'exec'), namespace)

File "/home/pi/tests/solar1.py", line 124, in gravity_minimal(t_end)

File "/home/pi/tests/solar1.py", line 103, in gravity_minimal gravity.evolve_model(gravity.model_time + (10| units.day))

File "/home/pi/amuse/src/amuse/support/methods.py", line 167, in call result = self.method(*list_arguments, **keyword_arguments)

File "/home/pi/amuse/src/amuse/support/methods.py", line 167, in call result = self.method(*list_arguments, **keyword_arguments)

File "/home/pi/amuse/src/amuse/support/methods.py", line 167, in call result = self.method(*list_arguments, **keyword_arguments)

File "/home/pi/amuse/src/amuse/support/methods.py", line 266, in call return self.method(*list_arguments, **keyword_arguments)

File "/home/pi/amuse/src/amuse/rfi/core.py", line 123, in call raise exceptions.CodeException("Exception when calling function '{0}', of code '{1}', exception was '{2}'".format(self.specification.name, type(self.interface).name, ex))

CodeException: Exception when calling function 'evolve_model', of code 'BrutusInterface', exception was 'lost connection to code'

GFTwrt commented 4 years ago

As contrast here a good example:

begin_time: 0.0 s default: 0.0 s brutus_output_directory: /home/pi/amuse/data/brutus/output/ default: ./ bs_tolerance: 1e-19 default: 1e-08 dt_param: 0.01 default: 0.24 stopping_condition_maximum_density: 2.55293255306e+306 m-3 * kg default: -0.0142011587158 m*-3 kg stopping_condition_maximum_internal_energy: inf m2 s-2 default: -2558461176.91 m2 s-2 stopping_condition_minimum_density: -0.0142011587158 m-3 * kg default: -0.0142011587158 m-3 * kg stopping_condition_minimum_internal_energy: -2558461176.91 m*2 s-2 default: -2558461176.91 m*2 s**-2 stopping_conditions_number_of_steps: 1 default: 1.0 stopping_conditions_out_of_box_size: 0.0 m default: 0.0 m stopping_conditions_out_of_box_use_center_of_mass: 0 default: False stopping_conditions_timeout: 4.0 s default: 4.0 s timestep: 10271.5479587 s default: 719008.357111 s word_length: 128 default: 72

0.0 s /home/pi/amuse/src/amuse/units/generic_unit_converter.py:189: RuntimeWarning: overflow encountered in double_scalars return new_quantity(number * factor, new_unit) 864000.0 s

GFTwrt commented 4 years ago

Hello can you confirm that the issue is related to: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91226 https://github.com/BrianGladman/mpfr/blob/master/tests/tget_set_d64.c / The volatile below avoids _Decimal64 constant propagation, which is buggy for non-canonical encoding in various GCC versions on the x86 and x86_64 targets: failure with gcc (Debian 20190719-1) 10.0.0 20190718 (experimental) [trunk revision 273586]; the MPFR test was not failing with previous GCC versions, but GCC versions 5 to 9 are also affected on the simple testcase at: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91226 /

ipelupessy commented 4 years ago

I have updated mpfr c++ to 3.6.6 - this should get rid of the deprecation warnings..can you try out? (its only updated in the github master, no release on pypi yet)

if you still get the error, can you post a minimal example wich triggers it?

ipelupessy commented 4 years ago

the above comment was for @GFTwrt ;-)

ipelupessy commented 4 years ago

btw, thanks for bringing this up (I had not noticed mpfr c++ was updated, the website still has the 2015 as the latest )

GFTwrt commented 4 years ago

Hello @ipelupessy,

Thank you for your action. It was not the sollution :-( . Please have a look to my last comment (gcc-Bug).

Attached the simple model file "solar1" (constellation of planets from Amuse-book) and the build log. 1e-19 runs 1e-20 fails.

Building code: brutus, target: all, in directory: src/amuse/community/brutus


make[1]: Verzeichnis „/home/tom/testam/src/amuse/community/brutus“ wird betreten /home/tom/testam/build.py --type=c interface.py BrutusInterface -o worker_code.cc /home/tom/testam/build.py --type=H -i amuse.support.codes.stopping_conditions.StoppingConditionInterface interface.py BrutusInterface -o worker_code.h make -C src all CXXFLAGS="-g -O2 -fPIC -I./mpfrc++ -std=c++0x -I../mpfrc++" make[2]: Verzeichnis „/home/tom/testam/src/amuse/community/brutus/src“ wird betreten g++ -O1 -g -O2 -fPIC -I./mpfrc++ -std=c++0x -I../mpfrc++ -c Star.cpp g++ -O1 -g -O2 -fPIC -I./mpfrc++ -std=c++0x -I../mpfrc++ -c Cluster.cpp g++ -O1 -g -O2 -fPIC -I./mpfrc++ -std=c++0x -I../mpfrc++ -c Bulirsch_Stoer.cpp g++ -O1 -g -O2 -fPIC -I./mpfrc++ -std=c++0x -I../mpfrc++ -c Brutus.cpp g++ -O1 -g -O2 -fPIC -I./mpfrc++ -std=c++0x -I../mpfrc++ -c main.cpp rm -f libbrutus.a ar crs libbrutus.a main.o Brutus.o Bulirsch_Stoer.o Cluster.o Star.o ranlib libbrutus.a make[2]: Verzeichnis „/home/tom/testam/src/amuse/community/brutus/src“ wird verlassen mpicxx -g -O2 -fPIC -I./mpfrc++ -std=c++0x -I../mpfrc++ -I./mpfrc++ -I/home/tom/testam/lib/stopcond -Impfrc++ -I./src -c -o interface.o interface.cc mpicxx -g -O2 -fPIC -I./mpfrc++ -std=c++0x -I../mpfrc++ -I./mpfrc++ -I/home/tom/testam/lib/stopcond -I./src worker_code.cc src/libbrutus.a interface.o -o brutus_worker -L./src -lbrutus -L/home/tom/testam/lib/stopcond -lstopcond -L/usr/lib/x86_64-linux-gnu/ -lmpfr -L/usr/lib/x86_64-linux-gnu/ -lgmp
make[1]: Verzeichnis „/home/tom/testam/src/amuse/community/brutus“ wird verlassen

solar1.txt

ipelupessy commented 4 years ago

there is an error in the state model for Brutus. The script will work if the parameters are set before the particles are added; I think if you do it the other way round the changes in the word_length are not propagated to the integrator the derived eta is not updated anymore...hence the failure to converge! So the script can be made to work by:

    ...
    gravity = Brutus(convert_nbody,number_of_workers=1)

    gravity.parameters.bs_tolerance = 1e-20
    gravity.parameters.word_length = 128
    gravity.parameters.dt_param = 0.010
    print(gravity.parameters) # to check values are correctly set
    gravity.particles.add_particles(bodies)
    ...

but I will try to fix the state model, because the ordering should not matter...

ipelupessy commented 4 years ago

hmm my explanation above was not entirely correct..

ipelupessy commented 4 years ago

@tjardaboekholt I think the problem is in the set_eta(tolerance) ..it is called in the setup which is called in the commit_particles...according to the state model of gravitational_dynamics commit_particles is triggered also when changing the parameters after adding particles. We could fix this by moving the setup or add an set_eta to the setter of the tolerance??

tjardaboekholt commented 4 years ago

I confirm that by setting the parameters before giving the particles to Brutus makes the script run. So please proceed using this temporary fix. Also, the current version of Brutus adapts eta to the value of epsilon that is given. In principle this should be ok as then you can just focus on 2 parameters (epsilon, word-length). Meanwhile I plan to update the Brutus version in Amuse soon, together with a fix for this issue. Many thanks for pointing this out to us.

GFTwrt commented 4 years ago

Thank you for your support. The model is evolving now. Lets have a look to the result.

GFTwrt commented 4 years ago

@tjardaboekholt: May I add a request - if you do some updates in Brutus? As I mentioned at the first post to amuse at github I want to do simulation of our solar system including solar wind. I expect to need a resolution in energy conservation better than 1 mW (milli Watt). At the moment the interface between the code and python is not able to transport this accuracy. Can you add a string based interface providing a number (difference of energy between two freely chosen timesteps by the time difference) as well as the particle data? It would be very nice to get such an interface.

spzwart commented 4 years ago

dear GFT,

in principle, you can do that already by converting your mW (which is basically an enery conserving quantity, to a tolerance. the tolerance then is the inverse of the fraction of the total binding energy of the Solar system in terms of 1mW. sounds like you are performing an interesting experiment.

Simon

On Tue, Feb 11, 2020, 21:49 GFTwrt notifications@github.com wrote:

@tjardaboekholt https://github.com/tjardaboekholt: May I add a request

  • if you do some updates in Brutus? As I mentioned at the first post to amuse at github I want to do simulation of our solar system including solar wind. I expect to need a resolution in energy conservation better than 1 mW (milli Watt). At the moment the interface between the code and python is not able to transport this accuracy. Can you add a string based interface providing a number (difference of energy between two freely chosen timesteps by the time difference) as well as the particle data? It would be very nice to get such an interface.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/amusecode/amuse/issues/579?email_source=notifications&email_token=ABCPFTEG6L3RAETPFUN3M73RCMFN5A5CNFSM4KPL5JBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELOAI5Y#issuecomment-584844407, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABCPFTDTZY4GMR3NJBEIJDDRCMFN5ANCNFSM4KPL5JBA .

GFTwrt commented 4 years ago

@spzwart: Thank You Simon. I was not sure how to interpret eta (tolerance/bs_tolerance) out of your paper. I was not sure about potential or energy. So the unit of tolerance (eta) is (1/(energy/power)) and therefore time?

ipelupessy commented 4 years ago

@tjardaboekholt: if you add the get__string versions (and maybe setters) of the particle attributes that would be a good start (the issue #155 suggests adding some code to automatically get e.g. gmpy attributes )

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 28 days if no further activity occurs. Thank you for your contributions.

ipelupessy commented 2 years ago

The original issue seems solved, but I am not sure if the later brutus fixes proposed have been implemented; @tjardaboekholt there is mention of a new brutus version: has that been merged? Also note the full string interface functions should be checked?

tjardaboekholt commented 2 years ago

Hi Inti, thanks for the reminder. The student Arend Moerman has implemented PN terms into Brutus. I will also check his Amuse interface/string treatment. I will work on merging this into Amuse as soon as I have some time!

GFTwrt commented 2 years ago

Hello, mayby you should have a look to https://github.com/GFTwrt/amuse/tree/master/src/amuse/community/gpuhermite8 too. Thomas Ps. The interface is the samethan https://github.com/GFTwrt/amuse/tree/master/src/amuse/community/brutus

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 28 days if no further activity occurs. Thank you for your contributions.