deephealthproject / eddl

European Distributed Deep Learning (EDDL) library. A general-purpose library initially developed to cover deep learning needs in healthcare use cases within the DeepHealth project.
https://deephealthproject.github.io/eddl/
MIT License
34 stars 10 forks source link

DeepLab EDDL and PyTorch comparison #302

Open lauracanalini opened 3 years ago

lauracanalini commented 3 years ago

Hi, I'm trying to reproduce DeepLab for image segmentation with EDDL starting from this PyTorch model, but it achieves about ~0.1 less of IoU. I'm using EDDL v1.0.3b with CUDNN and ECVL v0.4.2. This is the configuration of EDDL and PyTorch trainings:

I tried also with the dice loss and results are similar.

This is the model used:

class DeepLab
{
    int num_classes_;
    float bn_momentum = 0.1f, bn_eps = 1e-5f;

    layer ASPPModule(layer x, int planes, int kernel_size, int padding, int dilation)
    {
        x = ReLu(BatchNormalization(Conv2D(x, planes, { kernel_size,kernel_size }, { 1,1 }, "same", false, 1, { dilation,dilation }), true, bn_momentum, bn_eps));
        return x;
    }
    layer ASPP(layer x, int output_stride)
    {
        vector<int> dilations;
        if (output_stride == 16) {
            dilations = { 1, 6, 12, 18 };
        }
        else {
            throw "Not implemented output_stride";
        }

        layer x1 = ASPPModule(x, 256, 1, true, dilations[0]);
        layer x2 = ASPPModule(x, 256, 3, true, dilations[1]);
        layer x3 = ASPPModule(x, 256, 3, true, dilations[2]);
        layer x4 = ASPPModule(x, 256, 3, true, dilations[3]);
        layer x5 = GlobalAveragePool2D(x);
        x5 = ReLu(BatchNormalization(Conv2D(x5, 256, { 1,1 }, { 1,1 }, "same", false), true, bn_momentum, bn_eps));
        x5 = UpSampling2D(x5, { x4->getShape()[2], x4->getShape()[3] }, "bilinear");
        x = Concat({ x1,x2,x3,x4,x5 });
        x = ReLu(BatchNormalization(Conv2D(x, 256, { 1,1 }, { 1,1 }, "same", false), true, bn_momentum, bn_eps));
        x = Dropout(x, 0.5f);
        return x;
    }

    layer Decoder(layer x, layer low_level_feat)
    {
        low_level_feat = ReLu(BatchNormalization(Conv2D(low_level_feat, 48, { 1,1 }, { 1,1 }, "same", false), true, bn_momentum, bn_eps));
        x = UpSampling2D(x, { 4,4 }, "bilinear");
        x = Concat({ x, low_level_feat });

        x = Dropout(ReLu(BatchNormalization(Conv2D(x, 256, { 3,3 }, { 1,1 }, "same", false), true, bn_momentum, bn_eps)), 0.5f);
        x = Dropout(ReLu(BatchNormalization(Conv2D(x, 256, { 3,3 }, { 1,1 }, "same", false), true, bn_momentum, bn_eps)), 0.1f);
        x = Conv2D(x, num_classes_, { 1,1 }, { 1,1 }, "same", false);
        return x;
    }

public:

    DeepLab(int num_classes = 1) : num_classes_{ num_classes } {}

    layer init(layer& input, int output_stride = 16)
    {
        // Import the pretrained onnx obtained from the pretrained PyTorch ResNet101
        auto resnet101 = import_net_from_onnx_file("resnet101_simpl.onnx", { input->getShape()[1], input->getShape()[2], input->getShape()[3] });
        input = getLayer(resnet101, "input"); // set input layer
        auto low_level_feat = getLayer(resnet101, "Relu_35");
        auto x = getLayer(resnet101, "Relu_341");

        x = ASPP(x, output_stride);
        x = Decoder(x, low_level_feat);
        x = UpSampling2D(x, { 4,4 }, "bilinear");
        x = Sigmoid(x);
        return x;
    }
};

To reproduce the experiments you can replace an ECVL example with this skin_lesion_segmentation code modifying the dataset_path to the ISIC segmentation dataset (link of resnet101 pretrained ONNX).

Do you spot some mistakes in the model definition or in the way I use the layers of the pretrained network combined with those of DeepLab that have to be trained from scratch?

chavicoski commented 3 years ago

Hi, I am currently reproducing the experiment. I have some questions.

1) The plots that you provide of the BCE and DICE losses are from the training split? Watching the code to reproduce the experiment I see that the only metric tracked during validation is IoU.

2) The code seems correct to me, but one thing that might be causing the problem is that the "bilinear" interpolation of the UpSampling2D layer is not valid, we only have implemented the "nearest" mode. When it is executed it prints some warnings saying that it is going to use "nearest", not "bilinear".

3) I think there is a bug in the experiment code that you provided. In the part that computes the IoU the variable that is passed to the IoU function for the target is not valid and it raises a segfault. With this change it works.

    // Compute metric and optionally save the output images
    for (int k = 0; k < current_bs; ++k, ++n) {
        unique_ptr<Tensor> pred(output->select({ to_string(k) }));
        TensorToView(pred.get(), pred_t);
        unique_ptr<Tensor> target(y->select({ to_string(k) }));
        TensorToView(target.get(), target_t);
        // ** BEFORE **
        //cout << " - IoU: " << BinaryIoU(pred_t, orig_gt, 0.5, metric_list_iou);
        // ** AFTER **
        cout << " - IoU: " << BinaryIoU(pred_t, target_t, 0.5, metric_list_iou);
    }
lauracanalini commented 3 years ago
  1. No, they represent the IoU computed at each epoch on the validation split. The first plot compares an EDDL and a PyTorch training process with BCE loss, the second with Dice loss. I changed the plots title to make it more clear.
  2. Yes we didn't change the interpolation in the UpSampling layer, but in eddl (correct me if I'm wrong) after the warning the "nearest" interpolation is automatically applied, so I don't think that only this could have caused all this difference. Anyway I can launch a PyTorch training with all the interpolation set to "nearest" in order to see if there is any variation.
  3. Sorry, you're right. I changed some variable names right before sending you the code and I miss that.
RParedesPalacios commented 3 years ago

We still doesn't support dilations

chavicoski commented 3 years ago

We support them only with CuDNN. With CPU or GPU (without CuDNN) no.

lauracanalini commented 3 years ago

I tried to run a PyTorch training with all the interpolation in "nearest" mode but basically it doesn't change anything

chavicoski commented 3 years ago

Hi, I am doing some experiments with the code that you provided. I would like to know the difference without data augmentation. Can you provide the results with Pytorch without using data augmentation? Also, if you share the full pytorch code to reproduce the experiment that would be great.

lauracanalini commented 3 years ago

Hi, this is all the python code, usually launch with main.py /path/to/isic_segmentation.yml --workers 6 --gpu 1. Removing the augmentations (only Resize is applied), there is still a difference between EDDL and PyTorch (although it is a bit smaller as PyTorch gets a little worse while EDDL stays about the same level).

chavicoski commented 3 years ago

Hi,

I am doing some tests with the Pytorch and EDDL versions and I think that there is a problem with the data that is being feed to the model. With the EDDL I removed the data augmentation and I am using only the resizing and normalization:

auto training_augs = make_shared<SequentialAugmentationContainer>(
    AugResizeDim(size, InterpolationType::cubic),
    AugToFloat32(255, 255),
    AugNormalize({0.67501814, 0.5663187, 0.52339128}, {0.11092593, 0.10669603, 0.119005}) // isic stats
);

auto validation_augs = make_shared<SequentialAugmentationContainer>(
    AugResizeDim(size, InterpolationType::cubic),
    AugToFloat32(255, 255),
    AugNormalize({0.67501814, 0.5663187, 0.52339128}, {0.11092593, 0.10669603, 0.119005}) // isic stats
);

And with Pytorch the same:

 train_transform = A.Compose([
      A.Resize(args.size, args.size, cv2.INTER_CUBIC),
      A.Normalize(norm_mean, norm_std),
      ToTensorV2(),
  ])
  valid_test_transform = A.Compose([
      A.Resize(args.size, args.size, cv2.INTER_CUBIC),
      A.Normalize(norm_mean, norm_std),
      ToTensorV2(),
  ])

Now I am printing the max, min and mean values for each batch that is being loaded during training and I get very different results with the EDDL and the Pytorch. With EDDL (x is the input and y is the ground truth) for eaxmple:

x_max = 4.06464 - x_min = -6.08531 - x_mean = 0.0575973
y_max = 1 - y_min = 0 - y_mean = 0.238226

With Pytorch:

x_max = 0.015939775854349136 - x_min = -0.023863941431045532 - x_mean = 0.0009028838248923421
y_max = 1.0 - y_min = 0.0 - y_mean = 0.25823211669921875

Can you see if the same thing happens to you? Maybe the transformations that are being made to the images are not equivalent.

lauracanalini commented 3 years ago

I think you have to comment line 58 in dataset.py where images are divided by 255. You asked for the code without augmentations so I remove also the Normalize, but I had to add line 58 because in PyTorch the division is done inside the Normalize (while in ECVL there are two separate augmentations). If you comment the line results are quite similar.

RParedesPalacios commented 3 years ago

EDDL BatchNorm momentum (and Keras) is (1-Pytorch_momentum)

Then

float bn_momentum = 0.1f, bn_eps = 1e-5f;

Should be:

float bn_momentum = 0.9f, bn_eps = 1e-5f;

CostantinoGrana commented 3 years ago

So if I understand correctly the momentum in EDDL is how much of the running average is kept, and not how much of the current batch average is used to update: https://github.com/deephealthproject/eddl/blob/e6de5aaf5cf6308174ce40c3147ecd1b77a05357/src/hardware/cpu/nn/cpu_bn.cpp#L147-L150 Let's see if this fixes it.

RParedesPalacios commented 3 years ago

Yes you are right.

Alvaro just try it however it seems that it doesn't fix the problem... what is rare....