Closed robsoncsantiago closed 3 years ago
I don't see your pil_loader
function in the code above but I'm assuming it's the same as the one in this repository, it simply reads the image with Image.open()
. Since your ground-truth images are RGB it will have size (3, H, W). And, adding the batch dimension, (N, 3, H, W).
The loss function CrossEntropyLoss
expects a tensor of size (N, H, W), where each pixel value in N is in the interval [0, C-1], where C is the number of classes. Here is the source of the error, your batch of target images doesn't have the correct dimension.
The reason for that is that the dataset you are using encodes the class in RGB values instead of a single value. This is not a problem for CamVid and CityScapes because they happen to provide the target images in the desired format.
TLDR, in a new custom transform or in your CustomDataset.__get_item__()
you need to convert those RGB values into a single value and make sure that it returns tensors with size (H, W)
Got it, I'll include a custom transform with this process and I'll update here with the results. Thanks for the support!
I've changed '''CustomDataset.__get_item__()''' as follows:
def __getitem__(self, index):
"""
Args:
- index (``int``): index of the item in the dataset
Returns:
A tuple of ``PIL.Image`` (image, label) where label is the ground-truth
of the image.
"""
if self.mode.lower() == 'train':
data_path, label_path = self.train_data[index], self.train_labels[
index]
elif self.mode.lower() == 'val':
data_path, label_path = self.val_data[index], self.val_labels[
index]
elif self.mode.lower() == 'test':
data_path, label_path = self.test_data[index], self.test_labels[
index]
else:
raise RuntimeError("Unexpected dataset mode. "
"Supported modes are: train, val and test")
img, label = self.loader(data_path, label_path)
if self.transform is not None:
img = self.transform(img)
if self.label_transform is not None:
label = self.label_transform(label)
label = tools.rgb2mask(np.array(label), color_encoding)
label = transforms.ToTensor()(label).long()
label = label.squeeze(0)
return img, label
With rgb2mask as:
def rgb2mask(img, color2index):
assert len(img.shape) == 3
height, width, ch = img.shape
assert ch == 3
W = np.power(256, [[0],[1],[2]])
img_id = img.dot(W).squeeze(-1)
values = np.unique(img_id)
mask = np.zeros(img_id.shape)
for i, c in enumerate(values):
try:
mask[img_id==c] = color2index[tuple(img[img_id==c][0])]
except:
pass
return mask
So, apparently it is training and decreasing Iteration Loss through time, but when it comes to compute IoU, it throws the following error:
Any thoughts on that? Trying to debug it but no success so far... Again, any help would be greatly appreciated!
IoU(num_classes, ignore_index=False)
is the problem. You are passing False
to an argument that's expected to be an int or iterable. If you don't want to ignore any class simply do IoU(num_classes)
, ignore_index
defaults to None
in which case no class is ignored.
Gosh, stupid mistake... Now everything seems to be running smoothly as it should be. Any interesting outcomes from this work I'll make sure to post here.
Thanks for the support!
Hello, I tried to adapt the code into a .ipynb so I could run isolated cells and check how the pipeline works, as I'm trying to evaluate the ENet performance over learning and predicting underwater images (SUIM Dataset, link), but I'm facing some problems.
When I run the cell for training, it throws the following error: only batches of spatial targets supported (3D tensors) but got targets of dimension: 4. Have you faced similar issue before? Any help would be greatly appreciated!
Follow code below with minor changes: