Input image transforms for some of the models may be improved

Is your feature request related to a problem? Please describe. The input image transforms in some of the models may not be configured to offer optimal input to the model during training, validation, and inference.

For instance the endoscopic_inbody_classification example:

uses a Resized transform that shrinks images to 256×256 pixels, but it does not enable anti_aliasing. Especially when downscaling large video frames with sharp details, patterns, or lines in them, this can lead to aliased artifacts that may cause the model to learn the wrong things or recognize structures that aren't really in the original image.
uses NormalizeIntensityd with nonzero set to true. If there are zero-valued pixels in the image, they will not be scaled and offset together with the rest, which may cause discontinuities and again make it seem as if there are structures that aren't really there.
uses NormalizeIntensitydwithout specifying a fixed subtrahend and divisor. This means the intensities in each image will be normalized according to the mean and standard deviation in that specific image. If the image only contains a narrow range of intensities, for instance a dark image with sensor noise, this will be blown up to a big noisy mess in which the model might recognize random things. The usual way of working for ImageNet and such, is to calculate the mean and stddev across the entire training set and use those values everywhere.

Describe the solution you'd like

Set "anti_aliasing": true in the Resized transforms.
Consider leaving nonzero at false in the NormalizeIntensityd transform unless there really is a good reason for it.
Consider setting a fixed subtrahend and divisor in the NormalizeIntensityd transform.
Re-train models with whatever parameters were changed.

Additional context A paper that discusses the impact of aliasing in convolutional networks

I have tested the impact on processing time of enabling anti_aliasing. On a GPU (RTX A2000), the impact is tiny: transforming a 720p video frame and adding it to a batch takes 11ms instead of 9. On CPU the impact is much larger (55ms instead of 9).

Project-MONAI / model-zoo

Input image transforms for some of the models may be improved #441