Closed FG-TUM closed 2 years ago
FYI: I will fix the tests only when we agree on a layout of the output file ;)
@FG-TUM So I am struggling to build this. How do I need to link hdf5? I have it installed in my /lib/ directory and I configure with HDF5_DIR=/lib/ but still get Failed to find HDF5 (missing HDF5_INCLUDE_DIR HDF5_TARGETS C HL
.
What do I need to do? :) Also, couldn't this be useful? https://cmake.org/cmake/help/latest/module/FindHDF5.html
Initially I thought h5pp already cares about the import hdf5 because it worked on my workstation :D I could reproduce the problem on my laptop and fixed it in e51306e. Mind you this only fixes the main ladds target. Will fix tests asap.
@gomezzz The containers now manage to build everything, so surely you can do it too :P
@FG-TUM So, compiling and running works now but I am having some trouble with reading the created file.
I tried to modules for it, deepdish and h5py. The former doesn't work at all, for the latter I am missing the conjunctions in the file.
Looks like this
So deepdish doesn't load the particles, but yea, h5py would be fine as long as I can iterate over it sensibly. But I do need the conjunctions (ideally in the same format as in the previous csv with iteration,p1,p2,squaredDistance
or similar. :)
@FG-TUM One more thing: I ran it for 32k iterations writing a total of 800 timesteps to the HDF5. The created file is 383.424MB but 800 * 6 * 32bit * 17378 = 333.6576 MB
. So plain binary would be a better compression than what we have now. Are you sure the compression works correctly with this group thing?
@FG-TUM One more thing: I ran it for 32k iterations writing a total of 800 timesteps to the HDF5. The created file is 383.424MB but
800 * 6 * 32bit * 17378 = 333.6576 MB
. So plain binary would be a better compression than what we have now. Are you sure the compression works correctly with this group thing?
You forgot that we also store the particle ID as a 32 bit unsigned int hence we have 800 * 7 * 32bit * 17378 = 389.3 MB
of information.
The compression is an input parameter that defaults to 0 (=no compression). Did you set it in the yaml?
You forgot that we also store the particle ID as a 32 bit unsigned int hence we have 800 7 32bit * 17378 = 389.3 MB of information.
The compression is an input parameter that defaults to 0 (=no compression). Did you set it in the yaml?
Nope, didn't, explains it 👍
EDIT: But the default rn is 9 I think
The former doesn't work at all, for the latter I am missing the conjunctions in the file
Interesting that some work and some don't. Yes the conjunctions are not yet in the file but that was my next step after settling on the format. During development, I used the tool HDFCompass and hd5dump
to check the content of the files which was quite handy.
EDIT: But the default rn is 9 I think
No the default when you specify nothing is 0 (see here). I might have left the 9 in the yaml as an example so I don't know what you used ;)
I think we should write down the structure of the output file somewhere, especially since deepdish didn't work and h5py is ugly for printing these things.
Someing like the following in the readme maybe?
The generated HDF5 output file will have the following structure
| Collisions Data
| - ....
| Particle Data
| - Iteration
| - | - Particles
....
Description
The final implementation
See details in the next section
Things implemented and tested throughout this PR
Tests are for 17378 particles.
#Particles * 7 * 64Bit
.float
andint
) to save disk space. ecf7ea6Further Information
Legacy VTK: ASCII vs Binary
ASCII vs binary makes only a minor difference in file size for legacy VTK: It uses 6 significant digits. This plus the
.
and the trailing space makes 8 characters with 8 bit each resulting in 64 Bit per number.double
is also 64 Bit so :shrug: However, printing the double results in higher precision. On the other hand, is the compressibility of the ASCII way higher. But since the precision of our previous output is already similar tofloat
saving binary floats saves 50% of memory.HDF5 Compression
For the split layout (Separate datasets for pos, vel, id) the different compression levels yield the following file sizes for 100 write iterations. These values are without conjunctions to easier compare them to the expectation, However, adding the conjunctions doesn't add very much data.
Expected binary size:
100 * 17378 * 7 * 32Bit = 48.66MB
HDF5 Write Speed
When only measuring the time it takes to write the collision data for the first 100 iterations the HDF5 writer takes ~20x longer than the csv collision logger. Reasons beind this are probably that HDF5 is a more complex data format and that our old code used the highly efficient async spd logger. Wrapping the call to the HDF5 writer in
std::async
doesn't help as the individual writes are probably too small. If this becomes a problem at some point we could maybe introduce some buffered solution and then take a look at asynchronous writing again.~Related Pull Requests~
Resolved Issues
How Has This Been Tested?