dbolya / yolact

A simple, fully convolutional model for real-time instance segmentation.
MIT License
5.03k stars 1.32k forks source link

Prototypenet visualisation on custom data #82

Closed abhigoku10 closed 5 years ago

abhigoku10 commented 5 years ago

@dbolya The idea of protoypnet is really amazing great work , is there any process by which we can visualise the masks in the protonet so as to understand hwo the network is performing on the custom data

dbolya commented 5 years ago

There is, just it's not polished.

To get it to work change the tuple in this line: https://github.com/dbolya/yolact/blob/f46dc4385a41ed1f2df6716ecf6084081afcbec6/layers/output_utils.py#L180 to (4, 8) and then run with the arguments --display --display_lincomb=True (don't use --video or -image actually you can use --image).

You can check out that function for more ways to view the prototypes.

abhigoku10 commented 5 years ago

@dbolya thanks for this i am able to see the feature map , and i can i look into a particular image how is it learnt because the default which i am able to see the map i am not sure which image has it taken by default

dbolya commented 5 years ago

I'm not sure what you mean by that. Could you clarify what you mean by "i am not sure which image has it taken by default"?

abhigoku10 commented 5 years ago

when i run the command which you have shared i just get the feature maps , generally saliency maps / activation maps is dependent on the images given as an input to the model right , for eg : as mentioned in ur paper i want to visualize the tennis playing image visualisation

dbolya commented 5 years ago

Yeah, and it's using the current dataset's validation set by default (which is what --display does).

And I checked the code again and --display_lincomb=True totally works with --image, I don't know why I said you couldn't use it--I'll edit the original comment. You can choose which image to use using that then.

The tennis player example is image 000000000552.jpg in COCO val btw if you want to use it: --image=./data/coco/images/000000000552.jpg --display_lincomb=True

abhigoku10 commented 5 years ago

@dbolya as usual thanks for ur quick response and wonderful code set ... is the layer parameter for which i have to visualise configurable , since i wan to visualise the output on my custom data see hwo the model is learning

dbolya commented 5 years ago

This is actually just the output of protonet, i.e., the prototypes that are being combined into the masks. You can't configure it to output other layers sadly because it's in a very different spot in the code.

This is the function that's doing it: https://github.com/dbolya/yolact/blob/f46dc4385a41ed1f2df6716ecf6084081afcbec6/layers/output_utils.py#L167

Maybe that can help you create a version for the backbone features?

abhigoku10 commented 5 years ago

@dbolya wokay i shall try to tune this segment to obtain the feature viz for the all the layers of the network

abhigoku10 commented 5 years ago

@dbolya i was able to get the final 32 mask and observe the visualization on the custom data , how can i get the mask coefficients , since few detection of objects are missing so

dbolya commented 5 years ago

The mask coefficients are the parameter called masks passed into display_lincomb. (Yeah I should have named that better...)

abhigoku10 commented 5 years ago

@dbolya is there any way i can modify my model based on the masks generated , since i see that a particular object is not getting highlighted in the feature mask due to which detections are not happening

dbolya commented 5 years ago

Did you have anything in mind? Other than tuning the training parameters so that class is more important, I'm not sure how else you could fix that.

abhigoku10 commented 5 years ago

@dbolya Hi i was looking into the visualisation and have following queries 1.In the feature map i am able to see the object highlighted but in the output their is no detection unless i decrease the threshold

  1. There is mis classification of teh detected object
  2. if there are two objects only one object get detected
dbolya commented 5 years ago

@abhigoku10 By in the feature map, do you mean the prototypes? It doesn't matter if the object appears in the prototypes--if the backbone detector doesn't detect it, then there's no detection. The masks are only generated after the detector has made all its detections.

Misclassification again happens in the detector without regard to the mask branch. Our classification branch is quite shallow for speed reasons and thus we can't use something like focal loss to improve it. For future versions, this is something we want to fix.

On that note, the detector is also pretty shallow which might cause your 3rd point. 30 fps is a hard target to hit (33 ms / frame), so we walk a fine line here.

abhigoku10 commented 5 years ago

@dbolya yup feature map according to me is prototype I want to solve misclassification for my dataset so seeing is there any other method which will give me the hack yup you have done a great work thoughtfull appreciate in achieving 33ms

dbolya commented 5 years ago

You can try increasing the number of layers allocated to each branch. Change this setting in your config: https://github.com/dbolya/yolact/blob/f46dc4385a41ed1f2df6716ecf6084081afcbec6/data/config.py#L505

The order of extra layers is (bbox, conf, mask). So (0, 1, 0) would give just one extra layer to the class prediction branch (these layers aren't shared).

Btw I forgot to mention that since the detection score comes from the class branch, actually all the errors you mention are from the same branch (conf). So maybe you want to give that a lot of extra layers (let's say, (0, 2, 0)).