FG-TUM commented 2 years ago

Description

The final implementation

See details in the next section

[x] Legacy VTK output - small, fast, not pleasant for parsing
[x] HDF5 controlled via h5pp. Added cmake code to mute all kind of warnings.

[x] HDF5 for particle data and conjunctions - compressed, slow, nicely parsable HDF5 layout is now:

dataset    /CollisionData/<IterationNr>/Collisions
dataset    /ParticleData/<IterationNr>/Particles/IDs
dataset    /ParticleData/<IterationNr>/Particles/Positions
dataset    /ParticleData/<IterationNr>/Particles/Velocities

[x] everything controllable through the yaml file
- [x] removing the whole hdf5 block disables HDF5 output and conjunction data will be written as csv as before.

Things implemented and tested throughout this PR

Tests are for 17378 particles.

[x] Legacy VTK Output
- [x] Only position, velocity and particleID
- [x] Still in the code, currently unused.
[x] HDF5
- [x] Simple Version: One Group per Iteration, per group three datasets positions, velocities, ids. Seems to be ~970kb per Iteration which is exactly the binary size of #Particles * 7 * 64Bit.
- Option A: HighFive d34c272
- Option B: h5pp 09926dd
- [x] Fancy Version: One Group per Iteration, the group contains one data set with a line for each particle and columns for position, force and ID.
- [x] Implemented with h5pp b317101
- [x] supports compression (this is also possible with the simple versions)
- [x] Switched to 32 bit types (float and int) to save disk space. ecf7ea6
- [x] Restructured the yaml so that vtk and hdf5 can be enabled / disabled and their frequency controlled individually
- [x] move Conjunction Data to HDF5 file

Further Information

Legacy VTK: ASCII vs Binary

ASCII vs binary makes only a minor difference in file size for legacy VTK: It uses 6 significant digits. This plus the . and the trailing space makes 8 characters with 8 bit each resulting in 64 Bit per number. double is also 64 Bit so :shrug: However, printing the double results in higher precision. On the other hand, is the compressibility of the ASCII way higher. But since the precision of our previous output is already similar to float saving binary floats saves 50% of memory.

HDF5 Compression

For the split layout (Separate datasets for pos, vel, id) the different compression levels yield the following file sizes for 100 write iterations. These values are without conjunctions to easier compare them to the expectation, However, adding the conjunctions doesn't add very much data.

Expected binary size: 100 * 17378 * 7 * 32Bit = 48.66MB

Filesize [Bit]	[MB]	Compression Level
66485924	64	0
20346064	20	1
20336665	20	2
20381012	20	3
19094759	19	4
19116855	19	5
19203395	19	6
19229063	19	7
19226644	19	8
19224946	19	9

HDF5 Write Speed

When only measuring the time it takes to write the collision data for the first 100 iterations the HDF5 writer takes ~20x longer than the csv collision logger. Reasons beind this are probably that HDF5 is a more complex data format and that our old code used the highly efficient async spd logger. Wrapping the call to the HDF5 writer in std::async doesn't help as the individual writes are probably too small. If this becomes a problem at some point we could maybe introduce some buffered solution and then take a look at asynchronous writing again.

~Related Pull Requests~

Resolved Issues

[x] fixes #50

How Has This Been Tested?

[x] adapt validation tests
[x] loded new files in Paraview and visualized
[x] @gomezzz tested his analysis workflow <- will probably break accessing data

FG-TUM commented 2 years ago

FYI: I will fix the tests only when we agree on a layout of the output file ;)

gomezzz commented 2 years ago

@FG-TUM So I am struggling to build this. How do I need to link hdf5? I have it installed in my /lib/ directory and I configure with HDF5_DIR=/lib/ but still get Failed to find HDF5 (missing HDF5_INCLUDE_DIR HDF5_TARGETS C HL .

What do I need to do? :) Also, couldn't this be useful? https://cmake.org/cmake/help/latest/module/FindHDF5.html

FG-TUM commented 2 years ago

Initially I thought h5pp already cares about the import hdf5 because it worked on my workstation :D I could reproduce the problem on my laptop and fixed it in e51306e. Mind you this only fixes the main ladds target. Will fix tests asap.

FG-TUM commented 2 years ago

@gomezzz The containers now manage to build everything, so surely you can do it too :P

gomezzz commented 2 years ago

@FG-TUM So, compiling and running works now but I am having some trouble with reading the created file.

I tried to modules for it, deepdish and h5py. The former doesn't work at all, for the latter I am missing the conjunctions in the file.

Looks like this

So deepdish doesn't load the particles, but yea, h5py would be fine as long as I can iterate over it sensibly. But I do need the conjunctions (ideally in the same format as in the previous csv with iteration,p1,p2,squaredDistance or similar. :)

gomezzz commented 2 years ago

@FG-TUM One more thing: I ran it for 32k iterations writing a total of 800 timesteps to the HDF5. The created file is 383.424MB but 800 * 6 * 32bit * 17378 = 333.6576 MB. So plain binary would be a better compression than what we have now. Are you sure the compression works correctly with this group thing?

FG-TUM commented 2 years ago

@FG-TUM One more thing: I ran it for 32k iterations writing a total of 800 timesteps to the HDF5. The created file is 383.424MB but 800 * 6 * 32bit * 17378 = 333.6576 MB. So plain binary would be a better compression than what we have now. Are you sure the compression works correctly with this group thing?

You forgot that we also store the particle ID as a 32 bit unsigned int hence we have 800 * 7 * 32bit * 17378 = 389.3 MB of information.

The compression is an input parameter that defaults to 0 (=no compression). Did you set it in the yaml?

gomezzz commented 2 years ago

You forgot that we also store the particle ID as a 32 bit unsigned int hence we have 800 7 32bit * 17378 = 389.3 MB of information.

The compression is an input parameter that defaults to 0 (=no compression). Did you set it in the yaml?

Nope, didn't, explains it 👍

EDIT: But the default rn is 9 I think

FG-TUM commented 2 years ago

The former doesn't work at all, for the latter I am missing the conjunctions in the file

Interesting that some work and some don't. Yes the conjunctions are not yet in the file but that was my next step after settling on the format. During development, I used the tool HDFCompass and hd5dump to check the content of the files which was quite handy.

FG-TUM commented 2 years ago

EDIT: But the default rn is 9 I think

No the default when you specify nothing is 0 (see here). I might have left the 9 in the yaml as an example so I don't know what you used ;)

gomezzz commented 2 years ago

I think we should write down the structure of the output file somewhere, especially since deepdish didn't work and h5py is ugly for printing these things.

Someing like the following in the readme maybe?

The generated HDF5 output file will have the following structure

| Collisions Data
| - ....
| Particle Data
| - Iteration
| - | - Particles
....

esa / LADDS

Binary Output #68

Description

The final implementation

Things implemented and tested throughout this PR

Further Information

Legacy VTK: ASCII vs Binary

HDF5 Compression

HDF5 Write Speed

~Related Pull Requests~

Resolved Issues

How Has This Been Tested?