Open lauracanalini opened 3 years ago
Hi, I am currently reproducing the experiment. I have some questions.
1) The plots that you provide of the BCE and DICE losses are from the training split? Watching the code to reproduce the experiment I see that the only metric tracked during validation is IoU.
2) The code seems correct to me, but one thing that might be causing the problem is that the "bilinear" interpolation of the UpSampling2D layer is not valid, we only have implemented the "nearest" mode. When it is executed it prints some warnings saying that it is going to use "nearest", not "bilinear".
3) I think there is a bug in the experiment code that you provided. In the part that computes the IoU the variable that is passed to the IoU function for the target is not valid and it raises a segfault. With this change it works.
// Compute metric and optionally save the output images
for (int k = 0; k < current_bs; ++k, ++n) {
unique_ptr<Tensor> pred(output->select({ to_string(k) }));
TensorToView(pred.get(), pred_t);
unique_ptr<Tensor> target(y->select({ to_string(k) }));
TensorToView(target.get(), target_t);
// ** BEFORE **
//cout << " - IoU: " << BinaryIoU(pred_t, orig_gt, 0.5, metric_list_iou);
// ** AFTER **
cout << " - IoU: " << BinaryIoU(pred_t, target_t, 0.5, metric_list_iou);
}
We still doesn't support dilations
We support them only with CuDNN. With CPU or GPU (without CuDNN) no.
I tried to run a PyTorch training with all the interpolation in "nearest" mode but basically it doesn't change anything
Hi, I am doing some experiments with the code that you provided. I would like to know the difference without data augmentation. Can you provide the results with Pytorch without using data augmentation? Also, if you share the full pytorch code to reproduce the experiment that would be great.
Hi,
this is all the python code, usually launch with main.py /path/to/isic_segmentation.yml --workers 6 --gpu 1
. Removing the augmentations (only Resize is applied), there is still a difference between EDDL and PyTorch (although it is a bit smaller as PyTorch gets a little worse while EDDL stays about the same level).
Hi,
I am doing some tests with the Pytorch and EDDL versions and I think that there is a problem with the data that is being feed to the model. With the EDDL I removed the data augmentation and I am using only the resizing and normalization:
auto training_augs = make_shared<SequentialAugmentationContainer>(
AugResizeDim(size, InterpolationType::cubic),
AugToFloat32(255, 255),
AugNormalize({0.67501814, 0.5663187, 0.52339128}, {0.11092593, 0.10669603, 0.119005}) // isic stats
);
auto validation_augs = make_shared<SequentialAugmentationContainer>(
AugResizeDim(size, InterpolationType::cubic),
AugToFloat32(255, 255),
AugNormalize({0.67501814, 0.5663187, 0.52339128}, {0.11092593, 0.10669603, 0.119005}) // isic stats
);
And with Pytorch the same:
train_transform = A.Compose([
A.Resize(args.size, args.size, cv2.INTER_CUBIC),
A.Normalize(norm_mean, norm_std),
ToTensorV2(),
])
valid_test_transform = A.Compose([
A.Resize(args.size, args.size, cv2.INTER_CUBIC),
A.Normalize(norm_mean, norm_std),
ToTensorV2(),
])
Now I am printing the max, min and mean values for each batch that is being loaded during training and I get very different results with the EDDL and the Pytorch. With EDDL (x is the input and y is the ground truth) for eaxmple:
x_max = 4.06464 - x_min = -6.08531 - x_mean = 0.0575973
y_max = 1 - y_min = 0 - y_mean = 0.238226
With Pytorch:
x_max = 0.015939775854349136 - x_min = -0.023863941431045532 - x_mean = 0.0009028838248923421
y_max = 1.0 - y_min = 0.0 - y_mean = 0.25823211669921875
Can you see if the same thing happens to you? Maybe the transformations that are being made to the images are not equivalent.
I think you have to comment line 58 in dataset.py where images are divided by 255. You asked for the code without augmentations so I remove also the Normalize, but I had to add line 58 because in PyTorch the division is done inside the Normalize (while in ECVL there are two separate augmentations). If you comment the line results are quite similar.
EDDL BatchNorm momentum (and Keras) is (1-Pytorch_momentum)
Then
float bn_momentum = 0.1f, bn_eps = 1e-5f;
Should be:
float bn_momentum = 0.9f, bn_eps = 1e-5f;
So if I understand correctly the momentum in EDDL is how much of the running average is kept, and not how much of the current batch average is used to update: https://github.com/deephealthproject/eddl/blob/e6de5aaf5cf6308174ce40c3147ecd1b77a05357/src/hardware/cpu/nn/cpu_bn.cpp#L147-L150 Let's see if this fixes it.
Yes you are right.
Alvaro just try it however it seems that it doesn't fix the problem... what is rare....
Hi, I'm trying to reproduce DeepLab for image segmentation with EDDL starting from this PyTorch model, but it achieves about ~0.1 less of IoU. I'm using EDDL v1.0.3b with CUDNN and ECVL v0.4.2. This is the configuration of EDDL and PyTorch trainings:
I tried also with the dice loss and results are similar.
This is the model used:
To reproduce the experiments you can replace an ECVL example with this skin_lesion_segmentation code modifying the
dataset_path
to the ISIC segmentation dataset (link of resnet101 pretrained ONNX).Do you spot some mistakes in the model definition or in the way I use the layers of the pretrained network combined with those of DeepLab that have to be trained from scratch?