MP-Gadget / MP-Gadget

massively-parallel cosmology simulator
Other
59 stars 33 forks source link

Hang with LOW_PRECISION=float #184

Closed sbird closed 6 years ago

sbird commented 6 years ago

If you run the 'small' example with LOW_PRECISION=float it will hang at the first tree build, after detecting a (non-existent) close particle pair. I think the tree is fine but that reading the snapshot is breaking somehow.

Here is a breaking Options.mk: DESTDIR = build MPICC = mpicc MPICXX = mpic++ OPTIMIZE = -fopenmp -O2 -ggdb -Wall -std=gnu89 GSL_INCL = $(shell pkg-config --cflags gsl) GSL_LIBS = $(shell pkg-config --libs gsl) OPT += -DOPENMP_USE_SPINLOCK OPT += -DSFR OPT += -DBLACK_HOLES OPT += -DDENSITY_INDEPENDENT_SPH LOW_PRECISION=float

rainwoodman commented 6 years ago

This reminds me the IO module needs a major revision. It uses too many hard coded logic and macros. Shall this be my goal of the holiday?

sbird commented 6 years ago

If you would like - rewrite will probably fix this. Maybe add some tests at the same time?

rainwoodman commented 6 years ago

Ok. I'll think about this while driving and put something in at night. Need to do this better than last time.

Y

On Dec 29, 2017 9:32 AM, "Simeon Bird" notifications@github.com wrote:

If you would like - rewrite will probably fix this. Maybe add some tests at the same time?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rainwoodman/MP-Gadget/issues/184#issuecomment-354467990, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIbTJib4HyBtvehPpUxUeMBxM5J-WWvks5tFRQcgaJpZM4ROyT1 .

sbird commented 6 years ago

The current code is not too bad. Where are you driving to?

On Sat, Dec 30, 2017 at 11:05 AM, Yu Feng notifications@github.com wrote:

Ok. I'll think about this while driving and put something in at night. Need to do this better than last time.

Y

On Dec 29, 2017 9:32 AM, "Simeon Bird" notifications@github.com wrote:

If you would like - rewrite will probably fix this. Maybe add some tests at the same time?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rainwoodman/MP-Gadget/issues/ 184#issuecomment-354467990, or mute the thread https://github.com/notifications/unsubscribe-auth/ AAIbTJib4HyBtvehPpUxUeMBxM5J-WWvks5tFRQcgaJpZM4ROyT1 .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/rainwoodman/MP-Gadget/issues/184#issuecomment-354553708, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQ7fPVImBnjMLl4mKqEjH0uUL-biidAks5tFl9FgaJpZM4ROyT1 .

rainwoodman commented 6 years ago

The part that made me uncomfortable is that the schema sort of need to be defined twice, and if they are inconsistent the snapshot will be ill-formed. The macro depends on some of these consistencies, and that's likely why the single precision run crashes.

The missing piece is an abstract and efficient interface to obtain a column of several particles as a tightly packed array with a 'internal' precision (double / int64 I think) and the reverse operation. This can then be cast to the desired precision for the snapshots and / or for communication in the future.

The FOF halos also call for a better, systematic handling. For example, I wonder if we can reserve a particle type for FOF halos (7).

The schema of the snapshots are currently scattered for FOF and particles. I'll see if I can do a simple fix for this crash. The new IO module can be a parallel work.

On Sat, Dec 30, 2017 at 12:53 PM, Simeon Bird notifications@github.com wrote:

The current code is not too bad. Where are you driving to?

On Sat, Dec 30, 2017 at 11:05 AM, Yu Feng notifications@github.com wrote:

Ok. I'll think about this while driving and put something in at night. Need to do this better than last time.

Y

On Dec 29, 2017 9:32 AM, "Simeon Bird" notifications@github.com wrote:

If you would like - rewrite will probably fix this. Maybe add some tests at the same time?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rainwoodman/MP-Gadget/issues/ 184#issuecomment-354467990, or mute the thread https://github.com/notifications/unsubscribe-auth/ AAIbTJib4HyBtvehPpUxUeMBxM5J-WWvks5tFRQcgaJpZM4ROyT1 .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/rainwoodman/MP-Gadget/issues/ 184#issuecomment-354553708, or mute the thread https://github.com/notifications/unsubscribe-auth/ ABQ7fPVImBnjMLl4mKqEjH0uUL-biidAks5tFl9FgaJpZM4ROyT1 .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rainwoodman/MP-Gadget/issues/184#issuecomment-354567725, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIbTMBmcFM6giA5Gg_DPp6-N-HshooZks5tFqKvgaJpZM4ROyT1 .