Project-MONAI / tutorials

MONAI Tutorials
https://monai.io/started.html
Apache License 2.0
1.81k stars 673 forks source link

Spleen bundle `spatial_shape` not match in `metadata.json` #733

Closed yiheng-wang-nv closed 2 years ago

yiheng-wang-nv commented 2 years ago

In https://github.com/Project-MONAI/tutorials/blob/32dfc1cf45f0fceeab4ab9e2828d571e317d8a1a/modules/bundle/spleen_segmentation/configs/metadata.json#L37 The spatial_shape is (160, 160, 160), but the spatial size for training and the roi size for inference is (96, 96, 96) (see here). The size should be unified. In addition, I still have the following questions about the metadata.json part.

  1. The data_type is "dicom" (see here), but the data is in nii.gz form. Is it an error here? How should we define the data_type here?
  2. How should we define the format of image (see here). For spleen dataset, it is "magnitude", how about the brats dataset?
  3. As for the spatial_shape mentioned above, if the input size of the network in training is different from the roi size used for inference, which one should be used for spatial_shape in the metadata? As for a segmentation model, different sizes are supported usually, thus I'm not sure how to define it.

Hi @ericspod , may need your help to explain these questions, thanks!

ericspod commented 2 years ago

The spatial_shape value is provided if there is a set size of inputs the network requires, this can be specified as literals or expressions based on variables if the size varies but has constraints. In this case the shape should match what's in training and what's in the specification. If a network is trained patchwise but applied to whole images at inference or vice versa then these would be different.

Data type should be "type of source data used for training/validation" as in the spec so here probably should be nifti, this is meant for human understanding so saying more than just that would be helpful as well.

The idea with format is to specify something about what the contents of the data are, so the data are images but specifically for MR they are magnitude values as opposed to categories for segmentations or Hounsfield Units for CT. The value here should be "hounsfield" as the inputs are CT. The "modality" value should be "CT" as well.

yiheng-wang-nv commented 2 years ago

Thanks @ericspod for the detailed clarification.

expressions based on variables if the size varies but has constraints

As for the expression, do you have any advices on how to do that? Should it be a sentence? For instance, if the input size should be divisible by 16.

ericspod commented 2 years ago

The specification states at the bottom what this would look like:

Spatial shape definition can be complex for models accepting inputs of varying shapes, especially if there are specific conditions on what those shapes can be. Shapes are specified as lists of either positive integers for fixed sizes or strings containing expressions defining the condition a size depends on. This can be “*” to mean any size, or use an expression with Python mathematical operators and one character variables to represent dependence on an unknown quantity. For example, “2**n” represents a size which must be a power of 2, “2**n*m” must be a multiple of a power of 2. Variables are shared between dimension expressions, so a spatial shape of [“2**n”, “2**n”] states that the dimensions must be the same powers of 2 given by n.

This would be used as the example describes to define constraint between dimension sizes the network imposes. A more complex example like [m*(2**(3+n)),m*(2**(3+n))] would require inputs to have shapes that are multiples of powers of 2 with a minimum size of 8, another way to state the same thing would be [8+m*(2**n),8+m*(2**n)].