Project-MONAI / MONAI

AI Toolkit for Healthcare Imaging
https://monai.io/
Apache License 2.0
5.85k stars 1.08k forks source link

CropForegroundd does not adjust the affine accordingly #3167

Closed Spenhouet closed 2 years ago

Spenhouet commented 3 years ago

Describe the bug

We are using the CropForegroundd transform. We perform this on multiple images with different FoV. After storing all of them they no longer match (cannot be overlayed).

I'm unsure if there are multiple bugs but the first one I found is this one. Apparently, the CropForegroundd transform does not update the affine.

To Reproduce

  1. Download the following NIfTI file: thalamic.nii.gz

  2. Run the following code:

from pathlib import Path

import numpy as np
from monai.transforms.compose import Compose
from monai.transforms.croppad.dictionary import CropForegroundd
from monai.transforms.io.dictionary import LoadImaged, SaveImaged
from monai.transforms.utility.dictionary import EnsureChannelFirstd, ToTensord

path = Path('.')

THALA_KEY = 'thalamic'
FILE_KEYS = [THALA_KEY]

data = {
    THALA_KEY: path / 'thalamic.nii.gz',
}

process = Compose([
    LoadImaged(FILE_KEYS),
    ToTensord(FILE_KEYS),
    EnsureChannelFirstd(FILE_KEYS),
    CropForegroundd(FILE_KEYS, source_key=THALA_KEY),
    SaveImaged(
        FILE_KEYS,
        output_dir='output',
        output_postfix="crop",
        resample=False,
        output_dtype=np.int16,
        separate_folder=False,
    ),
])

process([data])

This should result in the following file being stored: thalamic_crop.nii.gz

  1. Load both images with MRIcroGL (or your favorite NIfTI viewer).

Expected behavior

We would expect that the affine is changed relative to the crop so that the cropped image still aligns in 3D space. This is not the case. The affine is still the same as before the crop.

Screenshots

Here you can see the mismatch in 3D space alignment due to the missing affine change. White is the original image and red is the cropped image.

3D alignment mismatch

Environment

Click to view the environment Printing MONAI config... MONAI version: 0.7.0 Numpy version: 1.19.2 Pytorch version: 1.8.1+cu111 MONAI flags: HAS_EXT = False, USE_COMPILED = False MONAI rev id: bfa054b9c3064628a21f4c35bbe3132964e91f43 Optional dependencies: Pytorch Ignite version: NOT INSTALLED or UNKNOWN VERSION. Nibabel version: 3.1.1 scikit-image version: 0.18.0 Pillow version: 8.3.2 Tensorboard version: 2.7.0 gdown version: NOT INSTALLED or UNKNOWN VERSION. TorchVision version: NOT INSTALLED or UNKNOWN VERSION. tqdm version: 4.62.3 lmdb version: NOT INSTALLED or UNKNOWN VERSION. psutil version: NOT INSTALLED or UNKNOWN VERSION. pandas version: 1.2.4 einops version: NOT INSTALLED or UNKNOWN VERSION. transformers version: NOT INSTALLED or UNKNOWN VERSION. For details about installing the optional dependencies, please visit: https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies Printing system config... System: Windows Win32 version: ('10', '10.0.19041', 'SP0', '') Platform: Windows-10-10.0.19041-SP0 Processor: Intel64 Family 6 Model 63 Stepping 2, GenuineIntel Machine: AMD64 Python version: 3.7.7 Process name: python.exe Command: ['C:\\Users\\SebastianPenhouet\\AppData\\Local\\Programs\\Python\\Python37\\python.exe', '-c', 'import monai; monai.config.print_debug_info()'] Open files: [popenfile(path='C:\\Windows\\System32\\de-DE\\KernelBase.dll.mui', fd=-1), popenfile(path='C:\\Windows\\System32\\de-DE\\kernel32.dll.mui', fd=-1)] Num physical CPUs: 6 Num logical CPUs: 12 Num usable CPUs: 12 CPU usage (%): [16.3, 11.8, 26.1, 20.3, 12.4, 7.2, 17.6, 16.3, 15.0, 9.2, 12.4, 58.2] CPU freq. (MHz): 3501 Load avg. in last 1, 5, 15 mins (%): [0.0, 0.0, 0.0] Disk usage (%): 97.8 Avg. sensor temp. (Celsius): UNKNOWN for given OS Total physical memory (GB): 31.9 Available memory (GB): 18.6 Used memory (GB): 13.3 Printing GPU config... Num GPUs: 1 Has CUDA: True CUDA version: 11.1 cuDNN enabled: True cuDNN version: 8005 Current device: 0 Library compiled for CUDA architectures: ['sm_37', 'sm_50', 'sm_60', 'sm_61', 'sm_70', 'sm_75', 'sm_80', 'sm_86', 'compute_37'] GPU 0 Name: Quadro K2200 GPU 0 Is integrated: False GPU 0 Is multi GPU board: False GPU 0 Multi processor count: 5 GPU 0 Total memory (GB): 4.0 GPU 0 CUDA capability (maj.min): 5.0
Nic-Ma commented 3 years ago

Hi @Spenhouet ,

Thanks for your investigation and suggestion. Yes, you are right, all the MONAI spatial or crop / pad transforms don't update affine matrix so far. I think the affine computation is not very easy for most developers, especially non-medical domain developers. To keep the transforms development simpler, we don't update affine so far. To achieve the similar purpose as your use case, usually I would recommend to apply invertd transform to invert the spatial / crop / pad transforms for your model prediction first, then match with original image. reference example: https://github.com/Project-MONAI/tutorials/blob/master/3d_segmentation/torch/unet_inference_dict.py#L70 CC @ericspod @wyli @rijobro .

Thanks.

Spenhouet commented 3 years ago

Hi @Nic-Ma

I understand that affine computations non-trivial. Nonetheless, this is very crucial for us. Is this something that could be worked on / fixed for all transforms? Is there someone on the core team who feels competent about this, who could take this on?

I can't follow your suggestion with the inversion. We do actually want to work with the crops and we need them to align since we are merging different crops with different FoV. This is something we will need within the next 1-2 weeks.

Spenhouet commented 3 years ago

I tried to implement a correct affine transformation based on the bounding box but I have some issues coming up with a generic implementation.

The foreground_start_coord is [33, 31, 29]. The foreground_end_coord is [143, 88, 102]. The bounding box is [110, 57, 73].

I assumed that a correct change of the affine could be calculated with:

results = process([data])

import nibabel as nib

# Load the pre crop image
nifti = nib.load(path / 'thalamic.nii.gz')
affine, matrix = nifti.affine, nifti.get_fdata()

# Resolution in mm
pixdim = nifti.header['pixdim'][1:4]
# Calculate the start of the bounding box in 3D space
start = results[0]['foreground_start_coord'] * pixdim

# Create an affine translation matrix
translation = np.eye(4)
translation[0:3, 3] = -start

# Apply the translation matrix to the pre crop affine
new_affine = translation @ affine

# Store the image
nib.save(nib.Nifti1Image(matrix_crop, new_affine), path / 'thalamic_crop2.nii.gz')

But this does not give the expected result.

Somehow I have to switch the y and z dimension in the start and also change the sign of the z dimension.

x, y, z = results[0]['foreground_start_coord']
start = np.array([x, -z, y]) * pixdim

I cannot explain these two changes. Any idea why they are necessary?

Nic-Ma commented 3 years ago

Hi @ericspod @wyli ,

I can't remember clearly, maybe here is some dim order difference in medical area? Could you please help confirm or comment?

Thanks in advance.

Spenhouet commented 3 years ago

Okay, I found out how to do this in a generic way.

This is since we need to adjust the coordinates with respect to the image orientation.

I did it now using nibabel.orientations.io_orientation:

def orient_coords(coords: np.ndarray, affine: np.ndarray):
    orientation = nib.io_orientation(affine).astype(np.int8)
    return (coords * orientation[:, 1])[orientation[:, 0]]

start = results[0]['foreground_start_coord'] * pixdim
start = orient_coords(start, affine)

translation = np.eye(4)
translation[0:3, 3] = start

new_affine = translation @ affine
wyli commented 3 years ago

looks good, I'm linking this to https://github.com/Project-MONAI/MONAI/issues/2424 to track the discussions.

Spenhouet commented 3 years ago

From what I can see, the CropForegroundd transform uses the CropForeground transform which in turn uses the SpatialCrop and BorderPad transforms.

I guess, the fix I wrote above does only fix the affine for the SpatialCrop transform but not for the BorderPad transform. For that one it will be hard, since it does not actually return / store what exact padding was performed.

We would need to completely refactor the CropForegroundd to actually use a SpatialCropd and BorderPadd version which itself then adjust the affines. Or in a first version the crop_pad method could be refactored so that there is also a compute_padding method. The second solution is less work, so I would probably go with that for now.

rijobro commented 3 years ago

We have the base classes Pad and SpatialCrop. If these were updated to optionally update a given dictionary of meta data, then all crop/pad transforms, whether dictionary or array, would benefit from the updates. This seems to me like the best way to update, what do you guys think?

class Pad(Transform)
    def __init__(
        self,
        to_pad: List[Tuple[int, int]],
        mode: Union[NumpyPadMode, PytorchPadMode, str] = NumpyPadMode.CONSTANT,
        **kwargs,
    ) -> None:
        ...

 def __call__(self, img, mode = None, meta_data: Optional[Dict] = None) -> Union[NdarrayOrTensor], Tuple[NdarrayOrTensor, Dict]]:
    # do the padding
    ...
    if meta_data is not None:
        # update the affine
        ...
        return img, updated_meta
    return img

The problem is that once we have that, we should really have it for all transforms. Otherwise, the affine will be wrong if we apply e.g., RandRotate90d beforehand and the affine isn't updated accordingly.

Spenhouet commented 3 years ago

The problem is that once we have that, we should really have it for all transforms

Yes, this is also what I argued for in https://github.com/Project-MONAI/MONAI/issues/2424#issuecomment-866681342

But this is a huge refactoring (a worthy one but huge).

Adding this for Pad and SpatialCrop is definitely needed but to make use of that in CropForegroundd will also need a chain of changes which I really don't have the time for.

rijobro commented 3 years ago

You're right, a lot of work. And as we said previously, the updated meta data would only be correct if there were no other untracked modifications to the meta data (e.g., rotation). We could gradually include this functionality, but I wonder if that's almost worse (people might think that we always handle the affine matrix when that isn't currently the case).

Spenhouet commented 3 years ago

people might think that we always handle the affine matrix

I would argue that that is already the case. It is more a surprise that this is not the case. I believe I mentioned this in another issue with respect to multiple subsequent affine transformations (https://github.com/Project-MONAI/MONAI/discussions/2427) where we expected these to automatically be combined to a single affine transformation (since this is quite common in the medical domain). We were also surprised that this is not the case.

wyli commented 3 years ago

Still, I feel affine is just a special case (linear transforms). We should consider the general use cases, as the effort in the invertible transforms https://github.com/Project-MONAI/MONAI/issues/1515. We are tracking the full transform info. for many spatial transforms. Perhaps the feature request in #2424 now becomes how to combine those pieces of transform info stored in https://github.com/Project-MONAI/MONAI/blob/536e056224c63af4a5abbeed8c232519f20bac21/monai/transforms/inverse.py#L22, so that they are friendly to the other packages such as visualisation software.

Spenhouet commented 3 years ago

I feel affine is just a special case (linear transforms)

I'm not sure how you are going to do ML on medical images without it. So this "special case" is actually the primary / main case which is always used. Correct me if I'm wrong.

Spenhouet commented 3 years ago

I did actually apply the fix as suggested within the SpatialCrop transform. This also works for the margin parameter.

I did not yet fix the BorderPad since I did not have data where this seems to have an effect. But it most likely also needs a fix.

from itertools import chain
from typing import (Callable, Dict, Hashable, List, Mapping, Optional, Sequence, Tuple, Union)

import numpy as np
import torch
from monai.config.type_definitions import (IndexSelection, KeysCollection, NdarrayOrTensor)
from monai.transforms.croppad.array import BorderPad
from monai.transforms.croppad.dictionary import NumpyPadModeSequence
from monai.transforms.inverse import InvertibleTransform
from monai.transforms.transform import MapTransform, Transform
from monai.transforms.utils import (compute_divisible_spatial_size, generate_spatial_bounding_box, is_positive)
from monai.utils.enums import NumpyPadMode
from monai.utils.misc import ensure_tuple, ensure_tuple_rep
from monai.utils.module import look_up_option
from monai.utils.type_conversion import convert_data_type
from monai.transforms.utils_pytorch_numpy_unification import floor_divide, maximum
from monai.utils.enums import TransformBackends
import nibabel as nib

def get_translation_affine(coords: np.ndarray) -> np.ndarray:
    dim = len(coords)
    affine = np.eye(dim + 1)
    affine[0:dim, -1] = coords
    return affine

def orient_coords(coords: np.ndarray, affine: np.ndarray) -> np.ndarray:
    orientation = nib.io_orientation(affine).astype(np.int8)
    return (coords * orientation[:, 1])[orientation[:, 0]]

def affine_to_pixdim(affine: np.ndarray) -> np.ndarray:
    """
    Calculate the voxel resolution in mm space (pixdim) based on the affine.

    Args:
        affine (np.ndarray): The affine describing the image.

    Returns:
        np.ndarray: The voxel resolution in mm space (pixdim).
    """
    dims = len(affine) - 1
    # Starting point projected in mm space.
    p1 = (np.ones(dims + 1) @ affine.T)[:dims]
    # Comparison points with single coordinate change projected in mm space.
    p2 = ((np.eye(dims + 1) + 1) @ affine.T)[:dims, :dims]
    # Calculate euclidean distance between starting point and comparison points.
    return np.linalg.norm(p2 - p1, axis=-1)  # type: ignore

class SpatialCrop(Transform):
    """
    General purpose cropper to produce sub-volume region of interest (ROI).
    If a dimension of the expected ROI size is bigger than the input image size, will not crop that dimension.
    So the cropped result may be smaller than the expected ROI, and the cropped results of several images may
    not have exactly the same shape.
    It can support to crop ND spatial (channel-first) data.

    The cropped region can be parameterised in various ways:
        - a list of slices for each spatial dimension (allows for use of -ve indexing and `None`)
        - a spatial center and size
        - the start and end coordinates of the ROI
    """

    backend = [TransformBackends.TORCH, TransformBackends.NUMPY]

    def __init__(
        self,
        roi_center: Union[Sequence[int], NdarrayOrTensor, None] = None,
        roi_size: Union[Sequence[int], NdarrayOrTensor, None] = None,
        roi_start: Union[Sequence[int], NdarrayOrTensor, None] = None,
        roi_end: Union[Sequence[int], NdarrayOrTensor, None] = None,
        roi_slices: Optional[Sequence[slice]] = None,
    ) -> None:
        """
        Args:
            roi_center: voxel coordinates for center of the crop ROI.
            roi_size: size of the crop ROI, if a dimension of ROI size is bigger than image size,
                will not crop that dimension of the image.
            roi_start: voxel coordinates for start of the crop ROI.
            roi_end: voxel coordinates for end of the crop ROI, if a coordinate is out of image,
                use the end coordinate of image.
            roi_slices: list of slices for each of the spatial dimensions.
        """
        roi_start_torch: torch.Tensor

        if roi_slices:
            if not all(s.step is None or s.step == 1 for s in roi_slices):
                raise ValueError("Only slice steps of 1/None are currently supported")
            self.slices = list(roi_slices)
        else:
            if roi_center is not None and roi_size is not None:
                roi_center = torch.as_tensor(roi_center, dtype=torch.int16)
                roi_size = torch.as_tensor(roi_size, dtype=torch.int16, device=roi_center.device)
                roi_start_torch = maximum(  # type: ignore
                    roi_center - floor_divide(roi_size, 2),
                    torch.zeros_like(roi_center),
                )
                roi_end_torch = maximum(roi_start_torch + roi_size, roi_start_torch)
            else:
                if roi_start is None or roi_end is None:
                    raise ValueError(
                        "Please specify either roi_center, roi_size or roi_start, roi_end.")
                roi_start_torch = torch.as_tensor(roi_start, dtype=torch.int16)
                roi_start_torch = maximum(roi_start_torch,
                                          torch.zeros_like(roi_start_torch))  # type: ignore
                roi_end_torch = maximum(torch.as_tensor(roi_end, dtype=torch.int16),
                                        roi_start_torch)
            # convert to slices (accounting for 1d)
            if roi_start_torch.numel() == 1:
                self.slices = [slice(int(roi_start_torch.item()), int(roi_end_torch.item()))]
            else:
                self.slices = [
                    slice(int(s.item()), int(e.item()))
                    for s, e in zip(roi_start_torch, roi_end_torch)
                ]
            self.roi_start = roi_start_torch.numpy()

    def __call__(
        self,
        img: NdarrayOrTensor,
        meta_data: Optional[Dict] = None,
    ) -> Union[NdarrayOrTensor, Tuple[NdarrayOrTensor, Dict]]:
        """
        Apply the transform to `img`, assuming `img` is channel-first and
        slicing doesn't apply to the channel dim.
        """
        sd = min(len(self.slices), len(img.shape[1:]))  # spatial dims
        slices = [slice(None)] + self.slices[:sd]
        pixdim_slice = slice(1, sd + 1)

        if meta_data is not None:
            affine = meta_data['affine']
            translation = get_translation_affine(self.roi_start)
            meta_data['affine'] = np.linalg.inv(translation @ np.linalg.inv(affine))

            return img[tuple(slices)], meta_data

        return img[tuple(slices)]

class CropForeground(Transform):
    """
    Crop an image using a bounding box. The bounding box is generated by selecting foreground using select_fn
    at channels channel_indices. margin is added in each spatial dimension of the bounding box.
    The typical usage is to help training and evaluation if the valid part is small in the whole medical image.
    Users can define arbitrary function to select expected foreground from the whole image or specified channels.
    And it can also add margin to every dim of the bounding box of foreground object.
    For example:

    .. code-block:: python

        image = np.array(
            [[[0, 0, 0, 0, 0],
              [0, 1, 2, 1, 0],
              [0, 1, 3, 2, 0],
              [0, 1, 2, 1, 0],
              [0, 0, 0, 0, 0]]])  # 1x5x5, single channel 5x5 image

        def threshold_at_one(x):
            # threshold at 1
            return x > 1

        cropper = CropForeground(select_fn=threshold_at_one, margin=0)
        print(cropper(image))
        [[[2, 1],
          [3, 2],
          [2, 1]]]

    """

    def __init__(
        self,
        select_fn: Callable = is_positive,
        channel_indices: Optional[IndexSelection] = None,
        margin: Union[Sequence[int], int] = 0,
        return_coords: bool = False,
        k_divisible: Union[Sequence[int], int] = 1,
        mode: Union[NumpyPadMode, str] = NumpyPadMode.CONSTANT,
        **np_kwargs,
    ) -> None:
        """
        Args:
            select_fn: function to select expected foreground, default is to select values > 0.
            channel_indices: if defined, select foreground only on the specified channels
                of image. if None, select foreground on the whole image.
            margin: add margin value to spatial dims of the bounding box, if only 1 value provided, use it for all dims.
            return_coords: whether return the coordinates of spatial bounding box for foreground.
            k_divisible: make each spatial dimension to be divisible by k, default to 1.
                if `k_divisible` is an int, the same `k` be applied to all the input spatial dimensions.
            mode: padding mode {``"constant"``, ``"edge"``, ``"linear_ramp"``, ``"maximum"``, ``"mean"``,
                ``"median"``, ``"minimum"``, ``"reflect"``, ``"symmetric"``, ``"wrap"``, ``"empty"``}
                one of the listed string values or a user supplied function. Defaults to ``"constant"``.
                see also: https://numpy.org/doc/1.18/reference/generated/numpy.pad.html
            np_kwargs: other args for `np.pad` API, note that `np.pad` treats channel dimension as the first dimension.
                more details: https://numpy.org/doc/1.18/reference/generated/numpy.pad.html

        """
        self.select_fn = select_fn
        self.channel_indices = ensure_tuple(
            channel_indices) if channel_indices is not None else None
        self.margin = margin
        self.return_coords = return_coords
        self.k_divisible = k_divisible
        self.mode: NumpyPadMode = look_up_option(mode, NumpyPadMode)
        self.np_kwargs = np_kwargs

    def compute_bounding_box(self, img: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
        """
        Compute the start points and end points of bounding box to crop.
        And adjust bounding box coords to be divisible by `k`.

        """
        box_start, box_end = generate_spatial_bounding_box(img, self.select_fn,
                                                           self.channel_indices, self.margin)
        box_start_ = np.asarray(box_start, dtype=np.int16)
        box_end_ = np.asarray(box_end, dtype=np.int16)
        orig_spatial_size = box_end_ - box_start_
        # make the spatial size divisible by `k`
        spatial_size = np.asarray(
            compute_divisible_spatial_size(spatial_shape=orig_spatial_size, k=self.k_divisible))
        # update box_start and box_end
        box_start_ = box_start_ - np.floor_divide(np.asarray(spatial_size) - orig_spatial_size, 2)
        box_end_ = box_start_ + spatial_size
        return box_start_, box_end_

    def compute_padding(
        self,
        img: np.ndarray,
        box_start: np.ndarray,
        box_end: np.ndarray,
    ):
        pad_to_start = np.maximum(-box_start, 0)
        pad_to_end = np.maximum(box_end - np.asarray(img.shape[1:]), 0)
        pad = list(chain(*zip(pad_to_start.tolist(), pad_to_end.tolist())))
        return pad

    def __call__(self,
                 img: np.ndarray,
                 box_start: Optional[np.ndarray] = None,
                 box_end: Optional[np.ndarray] = None,
                 pad: Optional[List] = None,
                 mode: Optional[Union[NumpyPadMode, str]] = None,
                 meta_data: Optional[Dict] = None):
        """
        Apply the transform to `img`, assuming `img` is channel-first and
        slicing doesn't change the channel dim.
        """
        img, *_ = convert_data_type(img, np.ndarray)  # type: ignore

        if box_start is None or box_end is None:
            print('compute_bounding_box')
            box_start, box_end = self.compute_bounding_box(img)

        if pad is None:
            pad = self.compute_padding(img, box_start, box_end)

        if meta_data is None:
            cropped = SpatialCrop(roi_start=box_start, roi_end=box_end)(img)
        else:
            cropped, meta_data = SpatialCrop(roi_start=box_start,
                                             roi_end=box_end)(img, meta_data=meta_data)

        cropped = BorderPad(spatial_border=pad, mode=mode or self.mode, **self.np_kwargs)(cropped)

        if self.return_coords and meta_data is not None:
            return cropped, box_start, box_end, meta_data

        if self.return_coords:
            return cropped, box_start, box_end

        if meta_data is not None:
            return cropped, meta_data

        return cropped

class CropForegroundd(MapTransform, InvertibleTransform):
    """
    Dictionary-based version :py:class:`monai.transforms.CropForeground`.
    Crop only the foreground object of the expected images.
    The typical usage is to help training and evaluation if the valid part is small in the whole medical image.
    The valid part can be determined by any field in the data with `source_key`, for example:
    - Select values > 0 in image field as the foreground and crop on all fields specified by `keys`.
    - Select label = 3 in label field as the foreground to crop on all fields specified by `keys`.
    - Select label > 0 in the third channel of a One-Hot label field as the foreground to crop all `keys` fields.
    Users can define arbitrary function to select expected foreground from the whole source image or specified
    channels. And it can also add margin to every dim of the bounding box of foreground object.
    """

    def __init__(
        self,
        keys: KeysCollection,
        source_key: str,
        select_fn: Callable = is_positive,
        channel_indices: Optional[IndexSelection] = None,
        margin: Union[Sequence[int], int] = 0,
        k_divisible: Union[Sequence[int], int] = 1,
        mode: NumpyPadModeSequence = NumpyPadMode.CONSTANT,
        start_coord_key: str = "foreground_start_coord",
        end_coord_key: str = "foreground_end_coord",
        allow_missing_keys: bool = False,
        **np_kwargs,
    ) -> None:
        """
        Args:
            keys: keys of the corresponding items to be transformed.
                See also: :py:class:`monai.transforms.compose.MapTransform`
            source_key: data source to generate the bounding box of foreground, can be image or label, etc.
            select_fn: function to select expected foreground, default is to select values > 0.
            channel_indices: if defined, select foreground only on the specified channels
                of image. if None, select foreground on the whole image.
            margin: add margin value to spatial dims of the bounding box, if only 1 value provided, use it for all dims.
            k_divisible: make each spatial dimension to be divisible by k, default to 1.
                if `k_divisible` is an int, the same `k` be applied to all the input spatial dimensions.
            mode: padding mode {``"constant"``, ``"edge"``, ``"linear_ramp"``, ``"maximum"``, ``"mean"``,
                ``"median"``, ``"minimum"``, ``"reflect"``, ``"symmetric"``, ``"wrap"``, ``"empty"``}
                one of the listed string values or a user supplied function. Defaults to ``"constant"``.
                see also: https://numpy.org/doc/1.18/reference/generated/numpy.pad.html
                it also can be a sequence of string, each element corresponds to a key in ``keys``.
            start_coord_key: key to record the start coordinate of spatial bounding box for foreground.
            end_coord_key: key to record the end coordinate of spatial bounding box for foreground.
            allow_missing_keys: don't raise exception if key is missing.
            np_kwargs: other args for `np.pad` API, note that `np.pad` treats channel dimension as the first dimension.
                more details: https://numpy.org/doc/1.18/reference/generated/numpy.pad.html

        """
        super().__init__(keys, allow_missing_keys)
        self.source_key = source_key
        self.start_coord_key = start_coord_key
        self.end_coord_key = end_coord_key
        self.cropper = CropForeground(
            select_fn=select_fn,
            channel_indices=channel_indices,
            margin=margin,
            k_divisible=k_divisible,
            **np_kwargs,
        )
        self.mode = ensure_tuple_rep(mode, len(self.keys))

    def __call__(self, data: Mapping[Hashable, np.ndarray]) -> Dict[Hashable, np.ndarray]:
        d = dict(data)
        img: np.ndarray
        img, *_ = convert_data_type(d[self.source_key], np.ndarray)  # type: ignore
        box_start, box_end = self.cropper.compute_bounding_box(img)
        d[self.start_coord_key] = box_start
        d[self.end_coord_key] = box_end
        for key, m in self.key_iterator(d, self.mode):
            meta_key = f"{key}_meta_dict"
            self.push_transform(d, key, extra_info={"box_start": box_start, "box_end": box_end})
            d[key], d[meta_key] = self.cropper(img=d[key],
                                               box_start=box_start,
                                               box_end=box_end,
                                               mode=m,
                                               meta_data=d[meta_key])

        return d
wyli commented 3 years ago

This and #3182 might be done lazily in terms of spatial resampling if there are multiple spatial transforms in a compose and we consider https://github.com/Project-MONAI/MONAI/issues/2425. I feel the feature request still requires more discussions and thorough design.

Constantin-Jehn commented 2 years ago

Thank you @Spenhouet , for bringing up the issue. I am working on slice to volume registration from multiple stacks and want to use cropping during preprocessing. But as the meta-data is not updated during cropping the spatial relationship between the stacks gets lost.

If you need it only for preprocessing, torchio (https://torchio.readthedocs.io/transforms/preprocessing.html#croporpad) does the job very well.

wyli commented 2 years ago

thanks, this is addressed by MetaTensor and for example, a demo here: https://github.com/Project-MONAI/tutorials/pull/758