esa / LADDS

Large-scale Deterministic Debris Simulation - Codebase for the ARIADNA Study between TU Munich and ESA's Advanced Concepts Team.
GNU General Public License v3.0
5 stars 3 forks source link

Binary Output #68

Closed FG-TUM closed 2 years ago

FG-TUM commented 2 years ago

Description

The final implementation

See details in the next section

Things implemented and tested throughout this PR

Tests are for 17378 particles.

Further Information

Legacy VTK: ASCII vs Binary

ASCII vs binary makes only a minor difference in file size for legacy VTK: It uses 6 significant digits. This plus the . and the trailing space makes 8 characters with 8 bit each resulting in 64 Bit per number. double is also 64 Bit so :shrug: However, printing the double results in higher precision. On the other hand, is the compressibility of the ASCII way higher. But since the precision of our previous output is already similar to float saving binary floats saves 50% of memory.

HDF5 Compression

For the split layout (Separate datasets for pos, vel, id) the different compression levels yield the following file sizes for 100 write iterations. These values are without conjunctions to easier compare them to the expectation, However, adding the conjunctions doesn't add very much data.

Expected binary size: 100 * 17378 * 7 * 32Bit = 48.66MB

Filesize [Bit] [MB] Compression Level
66485924 64 0
20346064 20 1
20336665 20 2
20381012 20 3
19094759 19 4
19116855 19 5
19203395 19 6
19229063 19 7
19226644 19 8
19224946 19 9

HDF5 Write Speed

When only measuring the time it takes to write the collision data for the first 100 iterations the HDF5 writer takes ~20x longer than the csv collision logger. Reasons beind this are probably that HDF5 is a more complex data format and that our old code used the highly efficient async spd logger. Wrapping the call to the HDF5 writer in std::async doesn't help as the individual writes are probably too small. If this becomes a problem at some point we could maybe introduce some buffered solution and then take a look at asynchronous writing again.

~Related Pull Requests~

Resolved Issues

How Has This Been Tested?

FG-TUM commented 2 years ago

FYI: I will fix the tests only when we agree on a layout of the output file ;)

gomezzz commented 2 years ago

@FG-TUM So I am struggling to build this. How do I need to link hdf5? I have it installed in my /lib/ directory and I configure with HDF5_DIR=/lib/ but still get Failed to find HDF5 (missing HDF5_INCLUDE_DIR HDF5_TARGETS C HL .

What do I need to do? :) Also, couldn't this be useful? https://cmake.org/cmake/help/latest/module/FindHDF5.html

FG-TUM commented 2 years ago

Initially I thought h5pp already cares about the import hdf5 because it worked on my workstation :D I could reproduce the problem on my laptop and fixed it in e51306e. Mind you this only fixes the main ladds target. Will fix tests asap.

FG-TUM commented 2 years ago

@gomezzz The containers now manage to build everything, so surely you can do it too :P

gomezzz commented 2 years ago

@FG-TUM So, compiling and running works now but I am having some trouble with reading the created file.

I tried to modules for it, deepdish and h5py. The former doesn't work at all, for the latter I am missing the conjunctions in the file.

Looks like this

image

So deepdish doesn't load the particles, but yea, h5py would be fine as long as I can iterate over it sensibly. But I do need the conjunctions (ideally in the same format as in the previous csv with iteration,p1,p2,squaredDistance or similar. :)

gomezzz commented 2 years ago

@FG-TUM One more thing: I ran it for 32k iterations writing a total of 800 timesteps to the HDF5. The created file is 383.424MB but 800 * 6 * 32bit * 17378 = 333.6576 MB. So plain binary would be a better compression than what we have now. Are you sure the compression works correctly with this group thing?

FG-TUM commented 2 years ago

@FG-TUM One more thing: I ran it for 32k iterations writing a total of 800 timesteps to the HDF5. The created file is 383.424MB but 800 * 6 * 32bit * 17378 = 333.6576 MB. So plain binary would be a better compression than what we have now. Are you sure the compression works correctly with this group thing?

You forgot that we also store the particle ID as a 32 bit unsigned int hence we have 800 * 7 * 32bit * 17378 = 389.3 MB of information.

The compression is an input parameter that defaults to 0 (=no compression). Did you set it in the yaml?

gomezzz commented 2 years ago

You forgot that we also store the particle ID as a 32 bit unsigned int hence we have 800 7 32bit * 17378 = 389.3 MB of information.

The compression is an input parameter that defaults to 0 (=no compression). Did you set it in the yaml?

Nope, didn't, explains it 👍

EDIT: But the default rn is 9 I think

FG-TUM commented 2 years ago

The former doesn't work at all, for the latter I am missing the conjunctions in the file

Interesting that some work and some don't. Yes the conjunctions are not yet in the file but that was my next step after settling on the format. During development, I used the tool HDFCompass and hd5dump to check the content of the files which was quite handy.

FG-TUM commented 2 years ago

EDIT: But the default rn is 9 I think

No the default when you specify nothing is 0 (see here). I might have left the 9 in the yaml as an example so I don't know what you used ;)

gomezzz commented 2 years ago

I think we should write down the structure of the output file somewhere, especially since deepdish didn't work and h5py is ugly for printing these things.

Someing like the following in the readme maybe?

The generated HDF5 output file will have the following structure

| Collisions Data
| - ....
| Particle Data
| - Iteration
| - | - Particles
....