Number of filters on the last layers

thimabru1010 commented 4 years ago

Hey,

First, Thanks for your code and your paper!

Why did u use on last layers of ResUnet-a D6 Simple Multitasking implementation (resunet_d6_causal_mtskcolor_ddist.py) the number of classes on boundary and distance outputs? In the first image and in the paper suggests the label is an image with a single channel. If u use self.NClasses u will use, for ur dataset, 6 filters on each layer creating images with 6 channels like segmentation.

I am missunderstand something? I am not used to mxnet.

feevos commented 4 years ago

Hi, thank you for your message. I don't know if I understand correctly your question, but each of the classes (in 1-hot representation) has its respective boundary (although some are common) and its respective distance transform. This is why the output is self.NClasses for all outputs of multitasking.

In the first image in this repo (and Fig 13 in the manuscript), the boundaries are summed from all channels (and normalized to 1) to produce a binary density map - similar for the distance transform, in order to provide a better visual understanding. However each class has it's own predictions.

Hope this helps.

thimabru1010 commented 4 years ago

But How did u do these calculations for each class? What I am doing in keras to create bound_ref and dist_ref is just to take the rgb patches and pass each one of them on the functions inside src/bound_dist.py and like that, they return images with 1 channel only. I can't see how could I use these functions to each class and create references with channels equals to the number of classes -> (patch_size, patch_size, num_classes)

feevos commented 4 years ago

Oh, I think I understand the confusion. These function are for 1-dimensional binary images. You have to actually translate the RGB ground truth mask of classes into a one-hot representation, and then apply these functions. Otherwise it will not work. If you read in detail the operations in file https://github.com/feevos/resuneta/blob/master/src/chopchop_run.py where I perform the splitting of large raster to training chips, you will see this explicitly. For example, in there I define the actual functions that I use in the production code. In line 63 you can see the function get_boundary and there it is evident that I transverse the 1-hot ground truth mask along the channel (first) dimension. Please note that in mxnet format, channels are first. This function is used in line 231 on the 1-hot representation of the ground truth mask. Similar operations are performed for the distance transform.

You can find a tutorial on how to go from RGB representation of classes to 1hot format in a LinkedIn article I've written in the past.

thimabru1010 commented 4 years ago

A ok, really thank you! You're helping a lot! I wasn't using the reference because I thought the OpenCV functions would only work with images varying pixels from 0 to 255 and not with one hot nor binary refs (0,1,2,3,4,...). Actually I changed the code to use the theses functions (Canny from OpenCV and cv2.distanceTransform) with my one hot references, however, both functions only worked after converting the NumPy tensor to np.uint8. It's okay do this conversion? Will, it does not affect the results?

feevos commented 4 years ago

uint8 is what I do as well prior calling them - actually in this repository + data, labels are in uint8 so this is not necessary. Just remember to translate to float32 when evaluating your loss function - I don't know how keras does that.

thimabru1010 commented 4 years ago

Hmmmm ok! Actually there was already at the of your code translation fo float32 on boundary's function, just before normalization. But on distance transform I just converted to uint8 to fit on cv2 distanceTransform function and returned the patch noramlized. So, should I do the same on distance function and apply float32?

OBS: On color transformation did u made anything else than converting to HSV and then normalizing?

feevos commented 4 years ago

there are several ways to create the distance transform, but from what I recall the input needs to be uint8. The output can be many things, depending on the normalization you will consider. So this is an opencv question to be honest. On color transformation, it's exactly as it is said in the paper, nothing more.

thimabru1010 commented 4 years ago

Thanks a lot! I think now the training is converging!

Since you don't have the training code, do you remember what was the loss weights you used on multitasking( on each one of the tasks)? With the default weight 1 to all the tasks, did you already obtained a better result, or there was a need to set specific weights?

feevos commented 4 years ago

The relative contribution weight for all tasks that we used is equal to 1. That is the point of being able to use the Tanimoto with dual loss in the paper as a regression loss: since all tasks participate with the same loss function, their gradients are balanced by construction. We did not experiment with different weighting schemes (unfortunately, the experiments are very computationally demanding).

Having said that, we are not holders of the absolute truth, please explore! We may have missed many many many things that could further improve the model.

thimabru1010 commented 4 years ago

Hmmm, very interesting! Because I was thinking of using a different loss to my problem. I'll take a look on these points.

thimabru1010 commented 4 years ago

Oh, I think I understand the confusion. These function are for 1-dimensional binary images. You have to actually translate the RGB ground truth mask of classes into a one-hot representation, and then apply these functions. Otherwise it will not work. If you read in detail the operations in file https://github.com/feevos/resuneta/blob/master/src/chopchop_run.py where I perform the splitting of large raster to training chips, you will see this explicitly. For example, in there I define the actual functions that I use in the production code. In line 63 you can see the function get_boundary and there it is evident that I transverse the 1-hot ground truth mask along the channel (first) dimension. Please note that in mxnet format, channels are first. This function is used in line 231 on the 1-hot representation of the ground truth mask. Similar operations are performed for the distance transform.

You can find a tutorial on how to go from RGB representation of classes to 1hot format in a LinkedIn article I've written in the past.

Well, Just saw that my boundaries and distance branches are also segmenting the image in prediction like the main segmentation branch (pixels in range 0,..,Nclasses). Should do something extra When you said "go from RGB representation of classes to 1hot format", is the usual 1hot representation of our segmentation layer? Because was just what I did. I got the 1 usual 1 hot that I use to training the segmentation layer and put it into the distance and boundary functions. HSV branch is the only one I used the normal RGB image as input of the label function (RGB2HSV opencv).

On distance and boundaries, I am taking the argmax of my matrix prediction and this is creating a 2d-array (h,w) which is the same as original segmentation. Should I do some postprocessing to have the boundaries and distance segmentation type?

EDIT: When I take the outṕut of each one of these branches and passes again to one hot and then trough respective function (boundary and distance), seems to be predicting nicely. But this seems cheating, and I was expecting the boundary and distance segmentation directly.

feevos commented 4 years ago

Hi @thimabru1010 , I don't understand exactly what is the problem, you need to provide some more information please?.

It is anticipated that the distance transform prediction will be an NClasses channel prediction, each channel having the distance transform of the given class. The boundary prediction is produced with sigmoid activation, because some boundaries are common but again it is an NClasses channels prediction. That is, the network has four output layers. In pseudocode:

out1, out2, out3, out4 = net(some_input)
# out1: segmentation, shape: Batch, NClasses, 256,256
# out2: distance, shape: Batch, NClasses, 256,256
# out3: boundary, shape: Batch, NClasses, 256,256
# out4: HSV color prediction, shape: Batch, 3, 256,256

# Here mask_XXX are the corresponding ground truth masks, in 1hot encoding. 
loss = (loss(out1,mask_segm) + loss(out2,mask_dist) + loss(out3,mask_bound) + loss(out4,HSV_img)) / 4.0
loss.backward()

You can see the activations of these layers, and how they are used in line 175 of the source code.

Can you please post: your classification head (that includes segmentation/distance/boundary) and some images of pairs ground truth/predictions?

For example, from inference demo of this repository, you see exactly what are the ground truths and predictions for each type of output layers. Here, each row corresponds to the input image, class index segmentation ground truth, class index segmentation prediction, class index boundary ground truth, class index boundary prediction, class index distance ground truth, class index distance prediction. In this image, row 3 is the class that corresponds to class CAR, and last row is class TREE

You should be seeing directly segmentation prediction, boundary prediction and distance transform prediction as separate layers (in shape: (Batch, NClasses, 256, 256) ) from the output of a trained network as in the example notebook above.

Hope this helps.

thimabru1010 commented 4 years ago

Hello @feevos, sorry for the late reply, I was finishing the period in university.

I think I misunderstood. I was taking the argmax like segmentation for the other heads. To plot a segmentation like yours, I used your routines on inference demo on view_slice_n_all_preds() function:

_preds_label = _preds[0].asnumpy() _preds_label = np.argmax(_preds_label,axis=0)

_preds_bound = _preds[1].asnumpy()
_preds_bound = np.sum(_preds_bound,axis=0)
_preds_bound /= _preds_bound.max()

_preds_dist = nd.sum(nd.clip(_preds[2],a_min=0.3,a_max=1.0),axis=0).asnumpy()
_preds_dist = (_preds_dist - _preds_dist.min())/(_preds_dist.max() - _preds_dist.min())

_preds_color = _preds[3].asnumpy()
_preds_color = _preds_color.transpose([1,2,0])

Then After I got these results. resuneta_multsk_seg_reduced

boundaries_pred_label distance_pred_label color_pred_label

The results were very bad because I didn't use tanimoto loss. Instead, I used a weighted version of cross-entropy. Do you think the multitasking is working even with unbalanced loss weights?

feevos commented 4 years ago

Hi @thimabru1010 I cannot really know what is going wrong with your training, because there are a lot of things that can be going wrong. From a bug in the code, different loss functions, or even it may be that you just didn't train long enough (the latter is super important and very difficult). I don't know how changing loss functions affects performance (other than different dice flavours, as described in the manuscript), I haven't done this test. Mostly because training a single model to optimality consumes significant computational resources. As a rule of thumb, it is relatively easy and fast to get MCC score (on a validation set ~10% of area of your training set) higher than 0.88 for this particular dataset (with the setup described in the paper), and the results should be much much much better than the ones shown here. From memory, with a batch size of ~256 - 512 you should be seeing mcc ~ 0.88 with this network after ~200 epochs on a D7model with PSPPooling in the middle (V2 in the paper). This is can be a very long training procedure and requires HPC facilities. Anything above mcc = .87 should give you OK-ish results that look much better than the ones you've shown. The boundary and distance predictions are super bad, indicating that something goes wrong in the multitasking approach you took. Now, although the performance of the algorithm you trained is poor, it looks you are on the right track in finding bugs and sorting it out!

I have never experimented thoroughly with cross entropy on segmentation problems to be honest, it is on my todo list. I don't know if it can be used for multitasking - I need to experiment on this.

The results on your color reconstruction suggest that you did not transform HSV predictions to RGB before getting the difference, right? In my code, _GT_color is on HSV color space.

From the notebook you reference:

    ax42.imshow(hsv_to_rgb(_preds_color),rasterized=True)
    diff = np.mean(_preds_color - _GT_color.transpose([1,2,0]),axis=-1)
    diff =  2*(diff-diff.min())/(diff.max()-diff.min()) - np.ones_like(diff)

thimabru1010 commented 4 years ago

Hello, again @feevos , Why did u mention the Note on this commentary on line 51 :

""" Tanimoto coefficient with dual from: Diakogiannis et al 2019 (https://arxiv.org/abs/1904.00592) Note: to use it in deep learning training use: return 1. - 0.5*(loss1+loss2) """

On your implementation, wasn't supposed to be just 0.5*(loss1+loss2)? Why subtracting from one at the end of dual loss function (already with the complement)? And why did u not use it on your implementation?

Thanks again for all the help.

feevos commented 4 years ago

Hi @thimabru1010, the definitions of the Tanimoto and Tanimoto with dual give 1 when preds == labels, and 0 if preds == 1-labels. That is, in order to make preds equal to labels, one needs to maximize these functions. Therefore, adding a minus sign, results in a minimization problem. 1 - loss results in a loss function that ranges between 0,1. It's exactly what I used in the paper - for each of the multitasking terms.

I apologise if the comment is a bit confusing. I am leaving it as a reference. Please, if you have more questions that are not related to this topic, open a new issue - so as interested people to be able to track down and search for them. Am closing this one, feel free to re-open another one.

edit: you can find more info on the loss on issue #3

feevos / resuneta

Number of filters on the last layers #5