dcupolillo commented 5 months ago

Description

I am attempting to train a model using my own 2D dataset with DeepD3, but the training process hangs when following instructions provided in Training DeepD3 model.ipynb, and it seems like it is not engaging the GPU as expected.

Images are signed int16 grayscale tiff images, dendritic and spines maks are binary tiff images, as shown in the example images below:

4_3_1 4_3_1_dendrite 4_3_1_spines

The creation of training data using the deepd3-training GUI and the generation of .d3data files proceed without issues. The generated .d3data is correctly displayed as shown in the screenshot below:

Screenshot3

Steps to Reproduce issue

Prepare a 2D dataset with images in signed int16 format. Dendritic and spine masks are provided as 2D binary TIFF images (generated with imageJ Fiji).
Create training data (.d3data files) using the deepd3-training GUI. Place bounding box, enter pixel size, resolution in microns and z step = 0.
Arrange training data in a training.d3set and validation.d3set files
Follow the instructions in Training DeepD3 model.ipynb Jupyter notebook (running in Anaconda Spyder) up to the m.fit function call.
Training process hangs at this step without showing advancing epochs and without significant GPU utilization.

Expected Behavior

The model training should progress, using the GPU efficiently, and displaying the Epochs running.

Actual Behavior

The training process stalls at the m.fit function, with no error messages but also no Epochs progress and inadequate GPU engagement.
I tried waiting overnight (amount of hours that however seems unlikely) with no luck.

Environment

tensorflow 2.8.0
deepd3 0.1
python 3.9.18
spyder 5.4.3
OS: Windows 11 Enterprise (64-bit, x64-based processor)
GPU: NVIDIA Geforce RTX 3080 (CUDA and drivers up to date)

Additional Context

GPU Compatibility: I have confirmed that my GPU setup is correctly configured and capable of running training when using provided datasets.
Dataset Specifics: The images in my dataset are generated using Vidrio ScanImage software. They are relatively small in size and resolution due to experimental constraints. Additionaly, default data type is signed int16. These characteristics may be influencing how DeepD3 interacts with the data during the training process.
Inference Performance: Using DeepD3's provided pre-trained models for inference on my dataset executes without any issues, although spine segmentation results are poor likely due to the significant differences between my dataset and the datasets the pre-trained models were trained on.

Tasks

[x] I sliced out a single z plane from provided stacks (image, dendritic mask and spines mask). Running the m.fit function on the generated d3sets worked as expected in my setup.
[x] I sliced_ out a single z plane from provided stacks and drew myself a dendritic and a spines mask. Running the m.fit function worked as expected in my setup.
[x] I scaled signed-int16 images to unsigned-int16, but did not work.

anki-xyz commented 5 months ago

Question: Is your code working the "our" d3set files that we provide? For reproducibility reasons: Can you provide me some example d3set files?

dcupolillo commented 5 months ago

Yes, the code works with DeepD3_Training.d3set and DeepD3_Validation.d3set. In few seconds I get the epochs running.

Here you can find a training.d3set and a validation.d3set.

Thanks for the help!

anki-xyz commented 5 months ago

Dear Dario,

these were the issues:

Image size

Your data has shape (1, 125, 65), which is very very small. Therefore, you get an infinite loop, when the stream would like to generate (1,128,128) images.

Training size

I reduced the input to the training to (1, 32, 32). That means, you need to adjust the U-Net as well as the DataGenerators. It is now very small in the bottleneck (1, 2, 2) which may cause problems. You may need to adjust this.

Dtype

int16 as dtype is also problematic, please use uint16. I fixed this by just shifting the data by the minimum.

With these changes, I could make the DeepD3 neural net run and it seems to converge. Results are not tested. If you use a fixed size for training, please remember that inference needs a flexible size (see hints on the DeepD3 website). Training_DeepD3_model.zip

ankilab / DeepD3

Problems when training with 2D dataset #7

Image size

Training size

Dtype