Image fusion very slow for multi terabyte dataset

dpshepherd commented 4 years ago

Hi all,

Thanks again for all the hard work on BigStitcher. We are working with data from a new light sheet instrument in our lab and have run into an issue I am hoping for advice on.

We have a dataset that consists of 56 tiles, each with 4 channels. Each tile ~ 900 x 48,000 x 75 pixels by 4 channels. We create the BDV HDF5 and XML in Python with three levels of (1,1,1), (4,8,4), and (8,16,8) downsampling. We are able to load, stitch the data, and run ICP for chromatic aberrations very quickly in BigStitcher. The results look correct in BigDataViewer and it is very responsive when exploring the data in all three dimensions.

When we attempt to fuse the tiles and resave as a new HDF5 or TIFF stacks, it is extremely slow compared to smaller datasets that we have previously used BigStitcher for. We have tried cropping in Z using a Bounding Box, but really do need the entire XY plane to be stitched together. For example, after roughly 24 hours, there is barely progress visible on the progress bar and roughly 39 GB of the new HDF5 exist on the drive.

The computer we are using is an i7-5820K with 6 cores/12 threads, 128 GB of memory, and a 12 TB SSD RAID0 running Windows 10 Pro 64-bit. The CPU usage is maxed out across all of the threads and there is roughly 100 GB of RAM being used. The disk I/O is average about 15 MB/s, so it does not seem to be I/O limited.

Are there any best practices or speed ups that we can explore?

Thanks, Doug

StephanPreibisch commented 4 years ago

Hi Doug, I am not sure what the reason for that behavior is. It could be that we hit some ImageJ limit there, not sure really.

Definitely fuse with "preserve original anisotropy" & disable non-rigid, just to be sure.

Is there any chance you can get the data to me for debugging? Guess not or?

StephanPreibisch commented 4 years ago

Could you send me a screenshot of the GUI when you run it?

StephanPreibisch commented 4 years ago

Also, what are the blocksizes at full resolution and how many threads did you set under Fiji > Edit > Options > Memory & Threads?

It could be that you out of memory because you try to load too many blocks at once due to multi-threading and blocks are dropped and loaded continuously

dpshepherd commented 4 years ago

Hi Stephan,

Thanks for the rapid response! I have a feeling there is user error here, since we haven't seen this behavior before.

Is there any chance you can get the data to me for debugging? Guess not or?

Yes, if needed we can mail a hard drive.

Could you send me a screenshot of the GUI when you run it?

Yes, next time we run it. Interestingly, we didn't see the option to "preserve original anisotropy" come up and the console shows: AnisotropyFactor: NaN

Also, what are the blocksizes at full resolution and how many threads did you set under Fiji > Edit > Options > Memory & Threads?

The block sizes are (64, 128, 64) as we created them in Python. We are using Nikita's npy2bdv library (https://github.com/nvladimus/npy2bdv). This is something we are not so sure about, since we haven't generated this type of data is that much longer in one dimension before. Any suggestions on best practice here? Also, should we keep the same block size for the different downsampling that we create when writing the HDF5?

Threads are set to 12.

StephanPreibisch commented 4 years ago

Yes, next time we run it. Interestingly, we didn't see the option to "preserve original anisotropy" come up and the console shows: AnisotropyFactor: NaN

That means something is wrong with the XML. Can you send it to me or post it here?

StephanPreibisch commented 4 years ago

The block sizes are (64, 128, 64) as we created them in Python. We are using Nikita's npy2bdv library (https://github.com/nvladimus/npy2bdv). This is something we are not so sure about, since we haven't generated this type of data is that much longer in one dimension before. Any suggestions on best practice here? Also, should we keep the same block size for the different downsampling that we create when writing the HDF5?

I would make them smaller, like 16x32x16 in HDF5 ...

dpshepherd commented 4 years ago

That means something is wrong with the XML. Can you send it to me or post it here?

Here it is. deskewed.zip

StephanPreibisch commented 4 years ago

Let's see after you send me the GUI screenshot and the XML. It might just be a combination of things for this massive dataset. How big is the HDF5?

dpshepherd commented 4 years ago

I would make them smaller, like 16x32x16 in HDF5 ...

Will do. That is easy.

How big is the HDF5?

2.4 TB

dpshepherd commented 4 years ago

I just looked at the Python code settings that were used for the deskewing of the light sheet data. Someone was exploring anisotropic block sizes and not using all of the data in the longest dimension. So the number of pixels and block sizes might be slightly different in that XML than I posted in the first post. Should be fairly close though.

Part of the reason the HDF5 is that small is that we are downsampling the Z dimension by 2x in Python before writing the HDF5.

dpshepherd commented 4 years ago

I'm going to rerun our deskewing code with HDF5 settings of downsampling (1,1,1), (4,8,4), (8,16,8) with blocksize (16,32,16). We can then reload it and try again. Should be ready later today to test in BigStitcher.

StephanPreibisch commented 4 years ago

Great, please let me knows how it goes and please send me the screenshot from the fusion GUI as well if you manage.

dpshepherd commented 4 years ago

Here is a zip file with the requested screenshot plus copies of the XML for

as created by npy2bdv before doing anything in BigStitcher.
after positioning the tiles and running stitching.
after running ICP for chromatic correction.

I started the image fusion using the settings shown in GUI and accepted the default settings for the HDF5 as suggested by BigStitcher.

20200515.zip

dpshepherd commented 4 years ago

Memory usage is lower so far (up to 50 gb), CPU usage remains high (~85-90%) across all threads, and I/O is between 5-10 mb/s.

After 3 hours there is 2.3 gb on disk in the HDF5 and nothing visible yet on the progress bar.

StephanPreibisch commented 4 years ago

Hi, I see now I think. You run out of memory while saving the blocks as each image is so large. And then the threads keep stealing each other the blocks making everything very inefficient.

To confirm that this is the case could you please try the following? 1) Image: Virtual 2) Fused Image: Display using ImageJ (should be just under the limit of what ImageJ can show) Finally: File > Save As > Image Sequence

This should work reasonably fast I hope. Please let me know ...

dpshepherd commented 4 years ago

Hi, I see now I think. You run out of memory while saving the blocks as each image is so large. And then the threads keep stealing each other the blocks making everything very inefficient.

Do you think this is a function of the final image size or the individual tile size? We can write smaller tiles to the HDF5 by splitting the data up.

should be just under the limit of what ImageJ can show

We are collecting even more data (yikes!) and will give this a shot afterward. That said, our experience with trying to look at max projections is that images don't display in ImageJ. The max projection is generated, but no image data is shown and the FIJI GUI freezes up.

Also - any insight into what is wrong with the XML that is leading to the "preserve original anisotropy" not showing up? We can make the needed changes to the XML generation in the npy2bdv code and let Nikita know.

StephanPreibisch commented 4 years ago

Hi, is there any way you can send me one HDD? We will need specific code for rewriting the fused result to HDF5, right now it assumes it can load one timepoint into RAM. We could also have a small chat this week to discuss?

dpshepherd commented 4 years ago

Yes, we can mail a disk no problem. Let's coordinate mailing and a quick chat via email. Best for me is: douglas.shepherd (at) asu.edu.

PreibischLab / BigStitcher

Image fusion very slow for multi terabyte dataset #72