Deep-MI / FastSurfer

PyTorch implementation of FastSurferCNN
Apache License 2.0
435 stars 115 forks source link

Generating HDF5 is too slow + conforming the images. #67

Open isukrit opened 2 years ago

isukrit commented 2 years ago

Hi team, I have been using FastSurfer for a while now and recently decided to use it to train on my own dataset. I have a couple of quick suggestions for the team when it comes to the _FastSurfer/FastSurferCNN/generatehdf5.py file:

  1. This sometimes fails because the base image/segmentation is not conformed to the 256256256 space and the 1mm^3 resolution. Wouldn't it be better to conform the input image and segmentation using the _load_and_conformimage function from the _data_loader/load_neuroimagingdata file? I also played around a bit (changed the code) and generated HDF5 files for other resolutions, but the training fails in that case because of an error with the downsampled sizes.
  2. This is simply too slow. I am trying out the HCP data and it's taking hours upon hours to get the HDF5 files. Do you want to use multiprocessing library and simply parallelize the for loop? I have done the parallelization with map.pool and it works much faster now!

I have attached the file for your reference, please.

Cheers, Sukrit

PS: Hoping to see the team at DZNE some time next summer! @m-reuter generate_hdf5.zip

isukrit commented 2 years ago

I still had issues with the HDF5 file generation. It is because the HCP data is huge and my workstation's RAM (which is pretty decent around ~128 GB) gives out before I can generate the output. I decided to go deep into things and further corrected by writing parts of the HDF5 in chunks. I have rewritten the code to handle this (also plugged a memory leak in the pool.map function) and you can consider the updated code for the users of FastSurfer.

Cheers, Sukrit generate_hdf5.zip