NVIDIA / DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html
Apache License 2.0
5.1k stars 615 forks source link

Observing difference in preprocessed output between dali and torch transform #4257

Closed shrinath-suresh closed 2 years ago

shrinath-suresh commented 2 years ago

While comparing the tensor output from torch transforms and dali transforms, we are observing difference in output.

Attaching the notebook with full reproducible example - Dali preprocessing repro.zip

Or below steps can be followed

Download the sample image (kitten.jpg) from here

Load the image as bytes (torchserve uses bytearray as input) . Hence we want to keep it this way

with open("kitten.jpg", "rb") as fp:
    img_data = fp.read()

preprocess the image with torch transforms

torch_transform = transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406],
                             std=[0.229, 0.224, 0.225])
    ])

PILImage = Image.open(io.BytesIO(img_data))
torch_transformed_tensor = torch_transform(PILImage)
print(torch_transformed_tensor.shape)

Define Dali pipeline

@pipeline_def(batch_size=1, num_threads=1, device_id=0)
def dali_pipeline(batch_tensor):
    jpegs = dali.fn.external_source(source=[batch_tensor], dtype=types.UINT8)
    jpegs = dali.fn.decoders.image(jpegs, device="mixed")
    jpegs = dali.fn.resize(jpegs, size=[256], subpixel_scale=False, interp_type=types.DALIInterpType.INTERP_LINEAR, mode="not_smaller")
    normalized = dali.fn.crop_mirror_normalize(
        jpegs,
        crop_pos_x=0.5,
        crop_pos_y=0.5,
        crop=(224, 224),
        mean=[0.485 * 255,0.456 * 255,0.406 * 255],
        std=[0.229 * 255,0.224 * 255,0.225 * 255],
    )

    return normalized

Preprocess using dali pipeline

# convert the image to numpy array
batch_tensor = []
np_image = np.frombuffer(img_data, dtype=np.uint8)
batch_tensor.append(np_image)
result = []
datam = PyTorchIterator([dali_pipeline(batch_tensor)], ['data'], last_batch_policy=LastBatchPolicy.PARTIAL, last_batch_padded=True)
for i, data in enumerate(datam):
    result.append(data[0]['data']) 

dali_transformed_tensor = result[0].squeeze(0).detach().cpu()
print(dali_transformed_tensor.shape)

Tensor output from torch transform

tensor([[[-0.1657, -0.3712, -0.5938,  ..., -0.9192, -0.9192, -0.9192],
         [-0.0287, -0.2171, -0.4226,  ..., -0.8335, -0.8507, -0.8678],
         [ 0.0569, -0.1143, -0.2856,  ..., -0.7822, -0.7993, -0.7993],
         ...,
         [ 1.7009,  1.4954,  1.6324,  ..., -0.6109, -0.5938, -0.4911],
         [ 1.7865,  1.7523,  1.7865,  ..., -0.6109, -0.5767, -0.4739],
         [ 1.7865,  1.7180,  1.7352,  ..., -0.5938, -0.5424, -0.4568]],

        [[ 0.0301, -0.1800, -0.4076,  ..., -0.5476, -0.5476, -0.5301],
         [ 0.2052, -0.0049, -0.2150,  ..., -0.4601, -0.4601, -0.4776],
         [ 0.3102,  0.1176, -0.0749,  ..., -0.4076, -0.3901, -0.4076],
         ...,
         [ 1.7108,  1.5182,  1.6232,  ..., -0.2675, -0.2500, -0.1625],
         [ 1.8158,  1.7983,  1.7983,  ..., -0.2675, -0.2325, -0.1450],
         [ 1.8508,  1.7283,  1.7458,  ..., -0.2675, -0.2325, -0.1450]],

        [[ 0.1999, -0.1138, -0.3927,  ..., -1.0724, -1.0898, -1.0898],
         [ 0.4265,  0.1476, -0.1487,  ..., -1.0027, -1.0201, -1.0376],
         [ 0.5659,  0.3393,  0.0605,  ..., -0.9330, -0.9504, -0.9678],
         ...,
         [ 1.8034,  1.5768,  1.6640,  ..., -0.7936, -0.7587, -0.6018],
         [ 1.9080,  1.8383,  1.8383,  ..., -0.7936, -0.7413, -0.5844],
         [ 1.9603,  1.7860,  1.8208,  ..., -0.8110, -0.7587, -0.6018]]])

Tensor output from dali tranform

tensor([[[-0.1657, -0.3883, -0.5938,  ..., -0.9192, -0.9192, -0.9192],
         [-0.0287, -0.2342, -0.4226,  ..., -0.8335, -0.8507, -0.8678],
         [ 0.0569, -0.1143, -0.2856,  ..., -0.7822, -0.7993, -0.7993],
         ...,
         [ 1.6667,  1.4954,  1.6324,  ..., -0.6281, -0.6109, -0.5082],
         [ 1.7523,  1.7352,  1.7694,  ..., -0.6109, -0.5767, -0.4739],
         [ 1.7865,  1.7180,  1.7352,  ..., -0.5938, -0.5596, -0.4568]],

        [[ 0.0301, -0.1975, -0.4076,  ..., -0.5476, -0.5476, -0.5476],
         [ 0.1877, -0.0224, -0.2150,  ..., -0.4776, -0.4776, -0.4951],
         [ 0.3102,  0.1176, -0.0749,  ..., -0.4076, -0.4076, -0.4251],
         ...,
         [ 1.7108,  1.5182,  1.6232,  ..., -0.2850, -0.2675, -0.1800],
         [ 1.7983,  1.7808,  1.7808,  ..., -0.2675, -0.2500, -0.1625],
         [ 1.8333,  1.7283,  1.7283,  ..., -0.2675, -0.2500, -0.1450]],

        [[ 0.1999, -0.1312, -0.4101,  ..., -1.0898, -1.0898, -1.1073],
         [ 0.4091,  0.1302, -0.1487,  ..., -1.0201, -1.0376, -1.0550],
         [ 0.5485,  0.3219,  0.0605,  ..., -0.9330, -0.9504, -0.9678],
         ...,
         [ 1.8034,  1.5768,  1.6465,  ..., -0.7936, -0.7587, -0.6193],
         [ 1.9080,  1.8383,  1.8383,  ..., -0.7936, -0.7413, -0.6018],
         [ 1.9428,  1.7860,  1.8034,  ..., -0.8284, -0.7761, -0.6193]]])

Both tensors are not equal , as the below code return False

torch.allclose(torch_transformed_tensor, dali_transformed_tensor)

We observe a difference of 0.07 delta between these tensors, as the below code returns true

torch.allclose(torch_transformed_tensor, dali_transformed_tensor, atol=0.07)

Passing the preprocessed output to the resnet pretrainned model predicts the same class in both torch and dali. However, the probabilities differ.

Output using torch tensor

tabby 0.4096631407737732
tiger_cat 0.3467048108577728
Egyptian_cat 0.13002879917621613
lynx 0.023919595405459404
bucket 0.011532180942595005

output using dali

tabby 0.4087514281272888
tiger_cat 0.3540496230125427
Egyptian_cat 0.12418904155492783
lynx 0.025347236543893814
bucket 0.011393276043236256

We would like to know if the behaviour is expected ? Or is there any way to fix the preprocessed tensor output (to be same between torch transform and dali).

Note: i have already went through the existing closed issue - https://github.com/NVIDIA/DALI/issues/3610 and updated the pipeline accordingly.

Tested with Pytorch 1.12.1, Cuda 11.3, and dali 1.16.1.

JanuszL commented 2 years ago

Hi @shrinath-suresh,

In the case of torchvision, you need to use the former INTERP_TRIANGULAR interpolation type that can be achieved as INTERP_LINEAR with antialias=True, as torchvision enables antialiasing by default when using linear interpolation. On top of that, another source of discrepancy is JPEG decoding. There's no JPEG decoding standard - in general, the better the PSNR for encode-decode, the better, so different decoders employ different tricks to improve the result - sometimes optimized for images from a specific field. In the case of nvJPEG (DALI uses under the hood for the GPU acceleration of the decoding process) conversation from YUV to RGB uses a different interpolation strategy than libjpeg-turbo used for the CPU decoding (botch in torchvision and DALI). So the mentioned pipeline should yield better results:

@pipeline_def(batch_size=1, num_threads=1, device_id=0)
def dali_pipeline(batch_tensor):

    jpegs = dali.fn.external_source(source=[batch_tensor], dtype=types.UINT8)
    jpegs = dali.fn.decoders.image(jpegs, device="cpu")
    jpegs = dali.fn.resize(jpegs, size=[256], subpixel_scale=False, interp_type=types.DALIInterpType.INTERP_LINEAR, antialias=True, mode="not_smaller")
    normalized = dali.fn.crop_mirror_normalize(
        jpegs,
        crop_pos_x=0.5,
        crop_pos_y=0.5,
        crop=(224, 224),
        mean=[0.485 * 255,0.456 * 255,0.406 * 255],
        std=[0.229 * 255,0.224 * 255,0.225 * 255],
    )

    return normalized

You can check this standalone example for reference Resize_example.zip.

Still, it is more or less expected that if the inference is run with different data processing pipeline (which is not bit-exact) than the network was trained with, the results will be slightly different).