bfortuner / pytorch_tiramisu

FC-DenseNet in PyTorch for Semantic Segmentation
MIT License
306 stars 65 forks source link

Need help with adapting to different dataset #10

Open sshkhr opened 6 years ago

sshkhr commented 6 years ago

Hi @bfortuner ,

First of all thanks for the excellent implementation of the FCDenseNets.

I am trying to use your tiramisu implemetation for a different dataset and could really use your help. Particularly I need insight into how this is working

class LabelToLongTensor(object):
def __call__(self, pic):
    if isinstance(pic, np.ndarray):
        # handle numpy array
        label = torch.from_numpy(pic).long()
    else:
        label = torch.ByteTensor(torch.ByteStorage.from_buffer(pic.tobytes()))
        label = label.view(pic.size[1], pic.size[0], 1)
        label = label.transpose(0, 1).transpose(0, 2).squeeze().contiguous().long()
    return label

This is making a 1x224x224 label tensor for a label image of size 224x224x3. Now I am unable to adapt this for my dataset. I have 7 classes and each label image is 224x224x3. Should my label tensor be 1x224x224 with each value between 0-6 or 1-7 ? The nll_loss2d expects the output to be 7224224 if I am correct.

bfortuner commented 6 years ago

Regarding the model:

In the tiramisu.py model there is a parameter for number of classes. You can set this for your new N classes and it will create N channels in the final convolution. The model has a separate channel for each class and predicts depthwise softmax probabilities which you can train directly with cross entropy.

Regarding the labels:

You need to provide a H x W label image with long/int values between 0 and N-1. LabelToLongTensor converts an image/numpy array into a (H,W) pytorch tensor.

NLLoss can handle 2d targets out of the box, so no need to flatten. http://pytorch.org/docs/master/nn.html#nllloss

sshkhr commented 6 years ago

Thanks for the prompt response. In my dataset the labels I have are in the form of images. What would be the fastest way to encode them into HxW with values between 0-(N-1) (or into long tensors of the required form). I was doing it using numpy array functions but it was taking extremely long.

Here's what I was doing :

Urban = [0,255,255]
Agricultural = [255,255,0]
Range = [255,0,255]
Forest = [0,255,0]
Water = [0,0,255]
Barren = [255,255,255]
Unknown = [0,0,0]

label_colours = np.array([Urban, Agricultural, Range, Forest, Water,
                            Barren, Unknown])

image = Image.open(img)
data = np.asarray( image, dtype="int32" )
labels = data.copy()[:,:,0]

def index(image_rgb):
        for idx, color in enumerate(label_colours):
            bool_map = (image_rgb == color).all()
            labels[bool_map] = idx

mp = np.apply_along_axis(index, 2, data)