Project-MONAI / MONAI

AI Toolkit for Healthcare Imaging
https://monai.io/
Apache License 2.0
5.55k stars 1.02k forks source link

Spacingd transform possibly resampling incorrectly #7529

Open aarpon opened 4 months ago

aarpon commented 4 months ago

Describe the bug I want to resample 3D microscopy data to be able to pass (almost) isotropic data through a 3D U-Net. I get data resampled along the wrong axes (to my understanding) and by the wrong factors.

To Reproduce This highly simplified example shows the (to me) unexpected behavior:

# Parameters
in_pixdim = [1.0, 0.25, 0.25]
out_pixdim = [0.25, 0.25, 0.25]

# Create data
image = torch.zeros(in_tensor_size, dtype=torch.float32) 
image = MetaTensor(image, meta={'pixdim': in_pixdim})
label = torch.zeros(in_tensor_size, dtype=torch.int32)
label = MetaTensor(label, meta={'pixdim': in_pixdim})
data = {"image": image, "label": label}

# Check shapes and metadata (is space = 'RAS' correct for microscopy?)
image.shape, image.meta, label.shape, label.meta

(torch.Size([20, 100, 100]),
 {'pixdim': [1.0, 0.25, 0.25],
  affine: tensor([[1., 0., 0., 0.],
          [0., 1., 0., 0.],
          [0., 0., 1., 0.],
          [0., 0., 0., 1.]], dtype=torch.float64),
  space: RAS},
 torch.Size([20, 100, 100]),
 {'pixdim': [1.0, 0.25, 0.25],
  affine: tensor([[1., 0., 0., 0.],
          [0., 1., 0., 0.],
          [0., 0., 1., 0.],
          [0., 0., 0., 1.]], dtype=torch.float64),
  space: RAS})

# Set up transforms
sp_d = Spacingd(keys=["image", "label"], pixdim=out_pixdim, mode=("bilinear", "nearest"),)
sp = Spacing(pixdim=out_pixdim, mode="bilinear",)

# Apply transforms
sp_data = sp_d(data)
sp_image = sp(image)

# Check outputs
sp_data["image"].shape, sp_data["image"].meta, sp_data["label"].shape, sp_data["label"].meta

(torch.Size([20, **397, 397**]),
 {'pixdim': **[1.0, 0.25, 0.25]**,
  affine: tensor([[0.2500, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.2500, 0.0000, 0.0000],
          [0.0000, 0.0000, 1.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 1.0000]], dtype=torch.float64),
  space: RAS},
 torch.Size([20, **397, 397**]),
 {'pixdim': **[1.0, 0.25, 0.25]**,
  affine: tensor([[0.2500, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.2500, 0.0000, 0.0000],
          [0.0000, 0.0000, 1.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 1.0000]], dtype=torch.float64),
  space: RAS})

sp_image.shape, sp_image.meta

(torch.Size([20, **397, 397**]),
 {'pixdim': **[1.0, 0.25, 0.25]**,
  affine: tensor([[0.2500, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.2500, 0.0000, 0.0000],
          [0.0000, 0.0000, 1.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 1.0000]], dtype=torch.float64),
  space: RAS})

# Try with another pair of in_pixdim and out_pixdim
in_pixdim = [1.0, 0.25, 0.25]
out_pixdim = [1.0, 0.25, 0.25]  # No change

# ... same code to transform the data ...

# Check outputs
sp_data["image"].shape, sp_data["image"].meta, sp_data["label"].shape, sp_data["label"].meta

(torch.Size([20, 100, **397**]),
 {'pixdim': **[1.0, 0.25, 0.25]**,
  affine: tensor([[1.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.2500, 0.0000, 0.0000],
          [0.0000, 0.0000, 1.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 1.0000]], dtype=torch.float64),
  space: RAS},
 torch.Size([20, 100, **397**]),
 {'pixdim': **[1.0, 0.25, 0.25]**,
  affine: tensor([[1.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.2500, 0.0000, 0.0000],
          [0.0000, 0.0000, 1.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 1.0000]], dtype=torch.float64),
  space: RAS})

sp_image.shape, sp_image.meta

(torch.Size([20, 100, **397**]),
 {'pixdim': **[1.0, 0.25, 0.25]**,
  affine: tensor([[1.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.2500, 0.0000, 0.0000],
          [0.0000, 0.0000, 1.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 1.0000]], dtype=torch.float64),
  space: RAS})

Expected behavior In the first case, the 'pixdim' metadata value of the upscaled data should be [0.25, 0.25, 0.25] instead of [1.0, 0.25, 0.25] and the shape should be [80, 100, 100] instead of [20, 397, 397].

In the second case, the 'pixdim' metadata is correct (I guess this never gets changed), but the shape should be [20, 100, 100] (as the input) instead of [20, 100, 397]. Why did the x dimension get upscaled?

Environment

Please see below.

Ensuring you use the relevant python executable, please paste the output of:

python -c "import monai; monai.config.print_debug_info()"

================================ Printing MONAI config...

MONAI version: 1.3.0 Numpy version: 1.26.3 Pytorch version: 2.2.0+cu121 MONAI flags: HAS_EXT = False, USE_COMPILED = False, USE_META_DICT = False MONAI rev id: 865972f7a791bf7b42efbcd87c8402bd865b329e MONAI file: /home//miniconda3/envs/qute-env/lib/python3.11/site-packages/monai/init.py

Optional dependencies: Pytorch Ignite version: NOT INSTALLED or UNKNOWN VERSION. ITK version: NOT INSTALLED or UNKNOWN VERSION. Nibabel version: 5.2.0 scikit-image version: 0.22.0 scipy version: 1.12.0 Pillow version: 10.2.0 Tensorboard version: 2.15.1 gdown version: NOT INSTALLED or UNKNOWN VERSION. TorchVision version: 0.17.0+cu121 tqdm version: 4.66.1 lmdb version: NOT INSTALLED or UNKNOWN VERSION. psutil version: 5.9.8 pandas version: 2.2.0 einops version: NOT INSTALLED or UNKNOWN VERSION. transformers version: NOT INSTALLED or UNKNOWN VERSION. mlflow version: NOT INSTALLED or UNKNOWN VERSION. pynrrd version: 1.0.0 clearml version: NOT INSTALLED or UNKNOWN VERSION.

For details about installing the optional dependencies, please visit: https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies

================================ Printing system config...

System: Linux Linux version: Ubuntu 23.10 Platform: Linux-6.5.0-25-generic-x86_64-with-glibc2.38 Processor: x86_64 Machine: x86_64 Python version: 3.11.5 Process name: python Command: ['python', '-c', 'import monai; monai.config.print_debug_info()'] Open files: [] Num physical CPUs: 4 Num logical CPUs: 8 Num usable CPUs: 8 CPU usage (%): [16.6, 11.4, 24.2, 45.3, 14.4, 14.0, 12.3, 64.2] CPU freq. (MHz): 3075 Load avg. in last 1, 5, 15 mins (%): [11.7, 7.3, 7.3] Disk usage (%): 81.8 Avg. sensor temp. (Celsius): UNKNOWN for given OS Total physical memory (GB): 125.5 Available memory (GB): 105.9 Used memory (GB): 17.2

================================ Printing GPU config...

Num GPUs: 1 Has CUDA: True CUDA version: 12.1 cuDNN enabled: True NVIDIA_TF32_OVERRIDE: None TORCH_ALLOW_TF32_CUBLAS_OVERRIDE: None cuDNN version: 8902 Current device: 0 Library compiled for CUDA architectures: ['sm_50', 'sm_60', 'sm_70', 'sm_75', 'sm_80', 'sm_86', 'sm_90'] GPU 0 Name: Quadro RTX 5000 GPU 0 Is integrated: False GPU 0 Is multi GPU board: False GPU 0 Multi processor count: 48 GPU 0 Total memory (GB): 15.7 GPU 0 CUDA capability (maj.min): 7.5

atbenmurray commented 4 months ago

Hi @aarpon, thanks for posting this. I'll take a quick look and get back to you

atbenmurray commented 4 months ago

Hi @aarpon, I've gone over your example, I think I understand the problem.

There are two fixes that you need to make to the example:

  1. Add a channel dimension to image and label
  2. Assign a homogeneous matrix containing the pixel dims to affine, instead of assigning "pix_dim" to the MetaTensor.

Add a channel dimension MONAI transforms expect spatial data to have a channel dimension followed by the spatial dimensions. Your tensors in this example should have a shape of (1, 20, 100, 100).

Assign to affine instead of 'pix_dim' The pixdim parameter on MetaTensor gets its values by picking the appropriate values out of MetaTensor.affine rather than a pix_dim field in the metadata. You generate the affine for a given set of pixel dimensions as follows:

p = (1.0, 0.25, 0.25)
affine = torch.tensor(
    [
        [p[0], 0.0, 0.0, 0.0],
        [0.0, p[1], 0.0, 0.0],
        [0.0, 0.0, p[2], 0.0].
        [0.0, 0.0, 0.0, 1.0]
    ]
)

You can then assign it to your MetaTensor as follows:

image = MetaTensor(image, affine=affine)

This should produce the expected output. Please let me know if this solves your problem or whether there is still an issue.

Thanks, Ben

aarpon commented 4 months ago

Thanks, @atbenmurray! Your suggestion indeed seems to work fine.

Just one curiosity: if I start again with the same tensor with shape (1, 20, 100, 100) and affine.diag() = [1.0, 0.25, 0.25, 1.0] and:

Is this just some rounding error somewhere?

Thanks again! a2

atbenmurray commented 4 months ago

@aarpon my apologies for the slow response; I didn't have much keyboard time yesterday.

There is indeed an issue with Spacing/Spacingd in this regard. If it is possible for you to do, I recommend making use of Resize/Resized instead if it is practical for you to do so.

If all of your samples are the same size, you can simply replace Spacing with Resize, passing in the desired size. If each of your samples is a different size, you might have to create a wrapper function/class that creates a Resize object for each call. Please let me know if you need more advice about how to do this.

aarpon commented 4 months ago

Hi, @atbenmurray

no worries, thanks for following up. I ultimately ended up writing my own CustomResampler and CustomResamplerd that directly use torch.nn.functional.interpolate since, in addition, I was struggling with the fact that during the training the affine matrix of each of my MetaTransforms in the batch would go from shape (4, 4) to shape (batch, 4, 4), and this would cause Spacingd to fail because it expected a 2D affine matrix, but was getting a 3D one. I don't know if this has to do with the fact that my DataLoaders are using list_data_collate as collate_fn, but I couldn't figure out what was causing this and ended up writing my own transforms. Any idea what the idea with the 3D affine matrix could be due to?

atbenmurray commented 3 months ago

@aarpon, sorry for leaving this dangling. Can you give me a minimal code sample that replicates the behaviour?

aarpon commented 2 months ago

Sorry for the late reply. I switch to my own resampler long time ago. In any case, I think the issue was that after Spacingd(), I was using RandCropByPosNegLabeld() and this -- again, not sure -- was adding a batch dimension to the affine matrix.