FirasGit / medicaldiffusion

Medical Diffusion: This repository contains the code to our paper Medical Diffusion: Denoising Diffusion Probabilistic Models for 3D Medical Image Synthesis
328 stars 58 forks source link

Warning: data is not aligned! This can lead to a speed loss #13

Open WhenMelancholy opened 1 year ago

WhenMelancholy commented 1 year ago

During the training process, I encountered the following warning outputs:

Sanity Checking DataLoader 0:   0%|                                                                                                     | 0/2 [00:00<?, ?it/s][swscaler @ 0x641c700] Warning: data is not aligned! This can lead to a speed loss
[swscaler @ 0x743a880] Warning: data is not aligned! This can lead to a speed loss
Epoch 0:   0%|                                                                                                                        | 0/565 [00:00<?, ?it/s][swscaler @ 0x59d9700] Warning: data is not aligned! This can lead to a speed loss
[swscaler @ 0x6c7f880] Warning: data is not aligned! This can lead to a speed loss

Although it did not affect the training, I am unclear about the reason behind this. My training instructions are as follows:

CUDA_VISIBLE_DEVICES=2 PL_TORCH_DISTRIBUTED_BACKEND=gloo PYTHONPATH=.:$PYTHONPATH python train/train_vqgan.py dataset=mrnet dataset.root_dir="~/github/medicaldiffusion/data/MRNet-v1.0/" model=vq_gan_3d model.gpus=1 model.default_root_dir="~/github/medicaldiffusion/when/checkpoints/vq_gan" model.default_root_dir_postfix="mrnet" model.precision=16 model.embedding_dim=8 model.n_hiddens=16 model.downsample=[4,4,4] model.num_workers=32 model.gradient_clip_val=1.0 model.lr=3e-4 model.discriminator_iter_start=10000 model.perceptual_weight=4 model.image_gan_weight=1 model.video_gan_weight=1 model.gan_feat_weight=4 model.batch_size=2 model.n_codes=16384 model.accumulate_grad_batches=1 

These instructions are referenced from train_vqgan.sh.

Thank you in advance!

benearnthof commented 1 year ago

@WhenMelancholy This happened for me aswell, as far as I know this indicates that the number of images in your training data is not evenly divisible by the number of CUDA devices you're training on. This should only have a negligible impact on training as long as you're only training on one server. I believe this is a warning from PyTorch lightning.

xiexing0916 commented 1 year ago

@benearnthof This happened for me aswell, could you please tell me how to debug? Is it because the dataset is not divisible by 16?

benearnthof commented 1 year ago

There is no reason to debug anything as this warning just indicates some minor inefficiencies when scaling images. My prior statement may be incorrect as this most likely stems from one of the image dimensions not being divisible by 16. This should not impact the model however