teraconverter - Option to set chunk size in HDF5

jflat06 commented 5 years ago

We have run into an issue with hdf5 files generated by teraconverter.

The chunk size is hard coded in imagemanager/HDF5Mngr.cpp to 16. For large datasets, this is causing a number of things to fail.

First and foremost, our server is dying sporadically when attempting to load these datasets. Also, a simple check using h5stat fails on these datasets.

The VAA3D plugin of teraconverter used a 256 default, and produced datasets which did not have this problem.

I have built my source with the 256 value, and the issue has gone away.

It would be very nice to able to set this as an option when calling teraconverter, instead of having it hard coded.

iannellog commented 5 years ago

Dear Jeff, I will definitely make that the chunk size can be set by a command line option, but I would like to better understand the problem. Indeed, on our linux machine we observed no problems with files up to 23 GBytes. Which is the size after which you observe problems? Another point is: the current hard coded size of chunks is 16x256x256. Hence, I understand that you have changed the size in the Z dimension (the first one in HDF5 convention). So you are using chunks of size 256x256x256. Is it right? I take this opportunity to inform you, in case you are not aware of it yet, that we have recently updated the TeraStitcher site with new documentation and features that greatly speed up the most intensive parts of the stitching pipeline. Best regards.

-- Giulio

Il giorno ven 14 set 2018 alle ore 00:46 Jeff Flatten < notifications@github.com> ha scritto:

We have run into an issue with hdf5 files generated by teraconverter.

The chunk size is hard coded in imagemanager/HDF5Mngr.cpp to 16. For large datasets, this is causing a number of things to fail.

First and foremost, our server is dying sporadically when attempting to load these datasets. Also, a simple check using h5stat fails on these datasets.

The VAA3D plugin of teraconverter used a 256 default, and produced datasets which did not have this problem.

I have built my source with the 256 value, and the issue has gone away.

It would be very nice to able to set this as an option when calling teraconverter, instead of having it hard coded.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/abria/TeraStitcher/issues/35, or mute the thread https://github.com/notifications/unsubscribe-auth/AIdv1l6p6lHG9DJGhJKU0-EX7RnqTQNBks5uauBagaJpZM4WoVLs .

--

Giulio Iannello Preside della Facolta' Dipartimentale di Ingegneria Universita' Campus Bio-Medico di Roma v. Alvaro del Portillo, 21 00128 Roma, Italy

Tel: +39-06-22541-9602 E-mail: g.iannello@unicampus.it Fax: +39-06-22541-9609 URL: https://scholar.google.it/citations?user=L-UJxIgAAAAJ

jflat06 commented 5 years ago

Hi Giulio,

The issue seems to have been caused by the sheer number of chunks generated when using 16. I will also note that it doesn't appear that the chunks are 16x256x256 - printing the chunksize out using the h5py library shows them at 16x16x16.

Comparing the number of chunks in two datasets of mostly comparable size:

Vaa3d plugin: 162032 Built command line binary: 391274544

These datasets are functional, but the large number of chunks seems to cause performance issues when the dataset is being parsed by some programs.

Can you try a simple h5stat on the 23GB file and time the result? With 256^3 it should be nearly instant, while with 16^3 it takes many minutes.

Thank you for the help!

iannellog commented 5 years ago

Jeff, many thanks for your reply. I actually did not realize that you were writing about the BigDataViewer format. In the last years we had a number of requests concerning the other HDF5 format supported, i.e. the Imaris format and I assumed that you too were interested to Imaris files. You are right, the chunk size hard coded for BDV is 16x16x16 which is too small. Thanks to your mail I realize that I have to revise the code concerning the BDV format. If will inform you as soon as I update the GitHub site. Best.

-- Giulio

Il giorno ven 14 set 2018 alle ore 13:54 Jeff Flatten < notifications@github.com> ha scritto:

Hi Giulio,

The issue seems to have been caused by the sheer number of chunks generated when using 16. I will also note that it doesn't appear that the chunks are 16x256x256 - printing the chunksize out using the h5py library shows them at 16x16x16.

Comparing the number of chunks in two datasets of mostly comparable size:

Vaa3d plugin: 162032 Built command line binary: 391274544

These datasets are functional, but the large number of chunks seems to cause performance issues when the dataset is being parsed by some programs.

Can you try a simple h5stat on the 23GB file and time the result? With 256^3 it should be nearly instant, while with 16^3 it takes many minutes.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/abria/TeraStitcher/issues/35#issuecomment-421335675, or mute the thread https://github.com/notifications/unsubscribe-auth/AIdv1nV3bJZGXMhhPB4225B_moTtClyPks5ua5kFgaJpZM4WoVLs .

--

Giulio Iannello Preside della Facolta' Dipartimentale di Ingegneria Universita' Campus Bio-Medico di Roma v. Alvaro del Portillo, 21 00128 Roma, Italy

Tel: +39-06-22541-9602 E-mail: g.iannello@unicampus.it Fax: +39-06-22541-9609 URL: https://scholar.google.it/citations?user=L-UJxIgAAAAJ

jflat06 commented 5 years ago

Great! Sorry, I should have specified BDV.

abria / TeraStitcher

teraconverter - Option to set chunk size in HDF5 #35