Training the network on a binary mask

murhafh commented 5 years ago

I'm trying to run the training using my own dataset, which consists of images and 2d binary masks. With the current label transformation I keep getting memory exceptions, I tried handling the label transformation to make it work but when the training starts the result showing (Mean IoU: 1.0000) with first epoch. Do you have suggestions on how to make the network work with a new dataset and specifically a binary classification task?

Thanks

davidtvs commented 5 years ago

For binary segmentation, you need to set the num_classes to 1 so the model outputs a single channel image. Should also change the criterion to torch.nn.BCEWithLogitsLoss.

Because you are using a different dataset you have to create your own torch.utils.data.Dataset class (similar to data/camvid.py and data/cityscapes.py) and use appropriate transformations to transform the sample images and targets (usually PIL objects) to tensors

murhafh commented 5 years ago

Thanks for your input. That's what I tried to do. I am doing a transformation from the label images (jpg binary mask image) using torch transform to tensor:

transforms.ToTensor()(pic).long().squeeze_()

In the dataset class I created, as I only have 2d images I use the following colour coding:

color_encoding = OrderedDict([ ('object', (255, 255, 0)), ('unlabeled', (0, 0, 0)) ])

also tried (255, 255, 255) to represent the object.

Now the training seem to start running with loss going down at every epoch, but IoU is always showing as 1.0. Any idea what could be happening?

davidtvs commented 5 years ago

Are you using nn.BCEWithLogits? Because if you are it should raise an error if it gets a long tensor as the target.

Have you tried printing some of the variables in the IoU class? I would start there. Print the value of the confusion matrix here (print(self.conf_metric.value())), that will show the confusion matrix after each batch. You could also print the predicted and target tensors in that same method.

murhafh commented 5 years ago

The issue is when I try to convert the PIL image to tensor using the following:

transforms.ToTensor()(pic)

That generates float tensors and then I get the following exception at data/utils.py :

class_count += np.bincount(flat_label, minlength=num_classes) TypeError: Cannot cast array data from dtype('float32') to dtype('int64') according to the rule 'safe'

That's why I was converting the tensor to long. I'm trying to check the logic the dataloader object is created and why the enet_weighing is expecting an integer input but it doesn't seem clear to me on how the transformation is happening. Your help is very much welcomed here.

davidtvs commented 5 years ago

With multiclass labels the criterion is nn.CrossEntropyLoss which expects the target to be a long tensor. Therefore, enet_weighing also expects a long tensor. To compute the weights we have to compute the class frequency and thus need to know how many pixels belong to each class in the whole dataset. That's what class_count += np.bincount(flat_label, minlength=num_classes) does.

Your issue is that you have a binary segmentation problem so the appropriate loss function is nn.BCEWithLogits which expects the target to be a float tensor. This leads to several issues because classes that depended on the target expect a long tensor and now get a float tensor.

In summary, np.bincount expects an array of ints so you simply have to change label = label.cpu().numpy() to label = label.cpu().numpy().astype(int).

You will also have to rewrite the IoU metric because it converts from logits to predictions by taking the maximum along the channels for each pixel. That won't work for you, you will have to threshold the logits (typically at 0.5) and compute the IoU from there.

Let me know if you run into issues.

davidtvs / PyTorch-ENet

Training the network on a binary mask #10