About mobilenetv3seg output and form of ground truth

I am trying to train my own dataset with two classes ( background and foreground ). I get confused of the output from seg net:

class MobileNetV3Seg(BaseModel):
    def __init__(self, nclass, aux=False, backbone='mobilenetv3_small', pretrained_base=False, **kwargs):
        super(MobileNetV3Seg, self).__init__(nclass, aux, backbone, pretrained_base, **kwargs)
        mode = backbone.split('_')[-1]
        self.head = _Head(nclass, mode, **kwargs)
        if aux:
            inter_channels = 40 if mode == 'large' else 24
            self.auxlayer = nn.Conv2d(inter_channels, nclass, 1)

    def forward(self, x):
        size = x.size()[2:]
        _, c2, _, c4 = self.base_forward(x)
        outputs = list()
        x = self.head(c4)
        x = F.interpolate(x, size, mode='bilinear', align_corners=True)
        outputs.append(x)  # Why output is a list which only append one elemenet?

        if self.aux:  # What's the aux for?
            auxout = self.auxlayer(c2)
            auxout = F.interpolate(auxout, size, mode='bilinear', align_corners=True)
            outputs.append(auxout)
        return tuple(outputs)

Furthermore I wonder the form of segmentation ground truth. My dataset comes with ground truth of dimension (Height, Width, 1) and last aixs sets 0 for background and 1 for foreground. But as my observation of your code the ground truth of cityscapes seems split into NUMCLASS ( 19 for cityscapes ) channels, something like (Height, Width, NUMCLASS ). So I wonder how to adapt the segmentation ground truth of my dataset to such kind of form.

Tramac / Lightweight-Segmentation

About mobilenetv3seg output and form of ground truth #18