LIVIAETS / boundary-loss

Official code for "Boundary loss for highly unbalanced segmentation", runner-up for best paper award at MIDL 2019. Extended version in MedIA, volume 67, January 2021.
https://doi.org/10.1016/j.media.2020.101851
MIT License
655 stars 98 forks source link

Assert error when constructing dist maps of ground truth #6

Closed GWwangshuo closed 5 years ago

GWwangshuo commented 5 years ago

Thanks for your great work. I am really enjoy it.

I am trying to modify your code to work on my project which is a binary segmentation problem (C =2 ). When I attempt transforming the loaded image of shape wh to C distance maps with shape cwh, I got assertion error.

I follow your implementation below:

    dist_map_transform = transforms.Compose([
        lambda img: np.array(img)[np.newaxis, ...],
        lambda nd: torch.tensor(nd, dtype=torch.int64),
        partial(class2one_hot, C=n_class),
        itemgetter(0),
        lambda t: t.cpu().numpy(),
        one_hot2dist,
        lambda nd: torch.tensor(nd, dtype=torch.float32)
    ])

My code for transforming load image to dist maps is as follows:

mask is a (H, W) numpy array

mask_tensor = torch.tensor(mask, dtype=torch.int64)
mask_onehot = class2one_hot(mask_tensor, 2)
mask_distmap = one_hot2dist(mask_onehot.cpu().numpy())

Corresponding error is:

assert one_hot(torch.Tensor(seg), axis=0) AssertionError

I found this error happened in one_hot2dist. It seems that there is some wrong with mask_onehot. But I do use your code without any modification. Why would this happen? Could your please give me some suggestions ? Thanks al lot. I am really appreciate.

Best

HKervadec commented 5 years ago

Ah, yes, my bad. Most of the utils works with shape bcwh, except one_hot2dist. And class2one_hot adds automatically an axis if needed:

def class2one_hot(seg: Tensor, C: int) -> Tensor:
    if len(seg.shape) == 2:  # Only w, h, used by the dataloader
        seg = seg.unsqueeze(dim=0)
    assert sset(seg, list(range(C)))
    assert len(seg.shape) == 3, seg.shape

    b, w, h = seg.shape  # type: Tuple[int, int, int]
    ...

This is why I added an itemgetter(0) in the transform. The code evolved in a weird way, and I also had to play around with utils expecting Tensor or np.ndarray

So what happens, and how to fix this (type hint won't work but useful to explain):

mask: np.ndarray[hw]
mask_tensor: Tensor[hw] = torch.tensor(mask, dtype=torch.int64)
mask_onehot: Tensor[chw] = class2one_hot(mask_tensor, 2)[0]  # because the res is bchw
mask_distmap: Tensor[chw] = one_hot2dist(mask_onehot.cpu().numpy())

I will probably change this behavior in the future ; this is too error prone right now. But at least the assertions makes it easy to catch it.

Thanks for your great work. I am really enjoy it.

Glad you find it useful!

GWwangshuo commented 5 years ago

Hi @HKervadec , thanks for your quick reply. I can run the code now; however, I encountered another problem. I have done something like this:

# transform ground truth (mask) to dist maps when loading images
mask_tensor = torch.tensor(mask, dtype=torch.int64)
mask_onehot = class2one_hot(mask_tensor, 2)[0]
mask_distmap = one_hot2dist(mask_onehot.cpu().numpy())
mask_distmap = torch.from_numpy(mask_distmap).float()

and

# For training purpose
region_loss = GeneralizedDice(idc=[0, 1])
surface_loss = SurfaceLoss(idc=[1])  

for input_image, dist_maps in dataloader:
    # input_image: bwh
    # dist_maps: bcwh
    optimizer.zero_grad()

    output_logits = net(input_image)  # bcwh
    output_softmaxes = F.softmax(output_logits, dim=1)  # bcwh

    loss = region_loss(outputs_softmaxes, dist_maps, None)  + surface_loss(outputs_softmaxes, dist_maps, None)  

    loss.backward()
    optimizer.step()

During training, I encountered following problems: image

The GDL (region loss) is negative which is incorrect. I go back to check its input outputs_softmaxes and dist_maps as follows:

outputs_softmaxes I check corresponding shape and values along with each dimensions . image and dist_maps image

Which one of outputs_softmaxes or dist_maps goes wrong? Can you give me some hints? I am really appreciate. Thanks.

Best!

HKervadec commented 5 years ago

The region loss does not take the distance map as an input, but mask_onehot:

for input_image, dist_maps in dataloader:
    ...
    loss = region_loss(outputs_softmaxes, dist_maps, None) + surface_loss(outputs_softmaxes, dist_maps, None)  
    ...

should be

for input_image, onehot_labels, dist_maps in dataloader:
    ...
    loss = region_loss(outputs_softmaxes, onehot_labels, None) + surface_loss(outputs_softmaxes, dist_maps, None)  
    ...

But funny that the distance maps pass the simplex assertion in the GDL loss ; did not realize it could. I guess I should replace it with one_hot (which is literally simplex and sset([0, 1])

GWwangshuo commented 5 years ago

Well, thanks for your generous help. I can train and evaluate now. But there are still something that I am not sure whether they are correct or not. e.g.,

After few epoch training, I get these results below:

contour_loss=0.01294, region_loss=0.55355, total_loss=0.56649 (training stage) --> jaccard: 0.00000 contour_loss=-0.00360, region_loss=0.42092, total_loss=0.41732 (training stage) --> jaccard: 0.60041 contour_loss=-0.01631, region_loss=0.23017, total_loss=0.21387 (training stage) --> jaccard: 0.62058 contour_loss=-0.01778, region_loss=0.23821, total_loss=0.22043 (training stage) --> jaccard: 0.63152 ...

Why the contour loss will become negative? Is this negative contour loss correct or not? Thanks.

Best!

HKervadec commented 5 years ago

Is this negative contour loss correct or not

Yes. The distance map is signed ; negative inside the object, and positive outside of it. So a perfect segmentation will be multiplied only with negative distances: the optimal value is negative.

bluesky314 commented 5 years ago

Should we stop training if it is negative? I trained on dice on a small dataset for 20 epochs and I then put your surface loss which started at around -0.7

Edit: I did continue and with decreasing the learning rate I got a 0.2+ on my dice score :) . I saw the training procedure in your paper about decreasing alpha. Would like any insights to how dice and surface loss interact and any training advice for better optimization.

Thanks, good work

HKervadec commented 5 years ago

Actually forgot to reply before closing it.

Basically you stop once the loss function does not improve anymore (reaching convergence), and/or validation dice stopped improving. This does not differ on how you usually decide to stop a training.

The best achievable value for the boundary loss would be (- (distance(posmask) - 1) * posmask).mean(): there is a perfect overlap between the predicted object and the ground truth, so we are summing only negative distances inside the object. Notice that this optimal value will be different for each image.