dusty-nv / jetson-inference

Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.
https://developer.nvidia.com/embedded/twodaystoademo
MIT License
7.73k stars 2.97k forks source link

change resolution of segmentation output? #704

Closed donbonjenbi closed 4 years ago

donbonjenbi commented 4 years ago

Hi,

I'm following the segmentation tutorial, and have the "fcn-resnet18-cityscapes-2048x1024" running on the Jetson Xavier NX.

The output mask resolution of the pre-trained model is set at 32x64px - is there a way to increase the output mask resolution, say to 64x128px or 128x256px?

Thanks in advance!

dusty-nv commented 4 years ago

The size of the raw output mask scales depending on the input resolution - so you would need to train on larger images to get a larger output mask. This would lead to longer runtimes, and I don't really recommend it because 2048x1024 is already large input.

If you are familiar with FCN-ResNet architecture, you could try editing the layers so it makes a larger mask, but this may have other impacts like reducing the accuracy. I haven't tried changing the network architecture.

To apply the raw mask to the original images, I upsample the mask with bilinear filtering or point filtering.

donbonjenbi commented 4 years ago

Got it, makes sense. Thanks!

wakingking77 commented 3 years ago

The size of the raw output mask scales depending on the input resolution - so you would need to train on larger images to get a larger output mask. This would lead to longer runtimes, and I don't really recommend it because 2048x1024 is already large input.

If you are familiar with FCN-ResNet architecture, you could try editing the layers so it makes a larger mask, but this may have other impacts like reducing the accuracy. I haven't tried changing the network architecture.

To apply the raw mask to the original images, I upsample the mask with bilinear filtering or point filtering.

Hello Dusty,I have the same question.And l also test to train my voc 2012 data using your training.py code ,and use 512 x 320 resolution . As you said,the putout layer will have the same resolution,but I get the 16 x 10 putout layer. so I try to use 2048 x 960 args to train voc 2012,the the model putout 64 x 40 . so I find a regular that putout resolution=input resolution/32 and we cannot change it,I want to know why,and How can I have a high resolution label layer (putout mask layer )as I labeled ? origin resolution out is best! thank you and look fowarding your reply😁

wakingking77 commented 3 years ago

Hi,

I'm following the segmentation tutorial, and have the "fcn-resnet18-cityscapes-2048x1024" running on the Jetson Xavier NX.

The output mask resolution of the pre-trained model is set at 32x64px - is there a way to increase the output mask resolution, say to 64x128px or 128x256px?

Thanks in advance!

I have the same question.Did you solve it seccessfully? how do you train your pytorch model? thank you very much!

dusty-nv commented 3 years ago

As you said,the putout layer will have the same resolution,but I get the 16 x 10 putout layer. so I try to use 2048 x 960 args to train voc 2012,the the model putout 64 x 40 . so I find a regular that putout resolution=input resolution/32 and we cannot change it,I want to know why,and How can I have a high resolution label layer (putout mask layer )as I labeled ? origin resolution out is best!

The raw classification grid is always going to be a factor of 32 smaller, because this is the network architecture. If you are familiar with FCN-Resnet you may be able to doctor the network architecture, but I have not tried it.

Normally the output of FCN segmentation network is same size, because it uses deconvolution on the smaller classification grid. However that operation is quite slow and makes it non-realtime, and the deconvolution is linear (i.e. learning rate of deconv layer is 0, meaning it doesn't 'learn' and is just linear operator). So i remove deconv layer from the model and instead do it in post-processing in my CUDA kernel.

Even if you kept the deconv layers inside the model, it would still be working from that factor-of-32 classifiation grid though. It doesn't really produce new information.

wakingking77 commented 3 years ago

As you said,the putout layer will have the same resolution,but I get the 16 x 10 putout layer. so I try to use 2048 x 960 args to train voc 2012,the the model putout 64 x 40 . so I find a regular that putout resolution=input resolution/32 and we cannot change it,I want to know why,and How can I have a high resolution label layer (putout mask layer )as I labeled ? origin resolution out is best!

The raw classification grid is always going to be a factor of 32 smaller, because this is the network architecture. If you are familiar with FCN-Resnet you may be able to doctor the network architecture, but I have not tried it.

Normally the output of FCN segmentation network is same size, because it uses deconvolution on the smaller classification grid. However that operation is quite slow and makes it non-realtime, and the deconvolution is linear (i.e. learning rate of deconv layer is 0, meaning it doesn't 'learn' and is just linear operator). So i remove deconv layer from the model and instead do it in post-processing in my CUDA kernel.

Even if you kept the deconv layers inside the model, it would still be working from that factor-of-32 classifiation grid though. It doesn't really produce new information.

Thanks a lot for your reply,I don't know how to change the network architecture, and it will cast my lot of time to study it. so I use your open source code like a object to solve my reality problem。My problem is to find a small object in 4k img ,so I want to putout a higher resolution label layer to find it 😂. So is an easy change in your training code or inference code that I can get a high resolution putout layer? for example 2k in and real 2k out…… 🤣🤣🤣

dusty-nv commented 3 years ago

If you are using jetson-inference segNet class to run the inference, it will re-scale the results using bilinear interpolation (similar to what deconv was doing, just faster) to whatever image size you provide as the output. Typically this is the same size as the input image.

wakingking77 commented 3 years ago

If you are using jetson-inference segNet class to run the inference, it will re-scale the results using bilinear interpolation (similar to what deconv was doing, just faster) to whatever image size you provide as the output. Typically this is the same size as the input image.

Thank you very much for your so quick reply. I think I konw want you mean, tomorrow I will try again.

wakingking77 commented 3 years ago

If you are using jetson-inference segNet class to run the inference, it will re-scale the results using bilinear interpolation (similar to what deconv was doing, just faster) to whatever image size you provide as the output. Typically this is the same size as the input image.

Dear Dusty,today I try again the infernce . I found that what you said is right , it is almost the same accuracy with offical result. I used the parameter : --filter-mode=point to test before, so it seems that the resolution is very low, but I didn't find that the accuracy of the detection is quite high.
This is a comparison chart of various results: bielefeld_000000_028414_leftImg8bit point liner bielefeld_000000_028414_gtFine_labelTrainIds

So I only one question left ,is that if you liner the result point , the color value is changed ? the color value is fitted?I want to get the detect result to do next work,so it's import for me .

dusty-nv commented 3 years ago

@wakingking77 during --filter-mode=linear, bilinear filtering is used to blend the colors for visualization. So that is not very useful for detecting the underlying class, because they have been blended. That is why --filter-mode=point is more likely to be useful when you want to know the actual classes.

You can use the segNet.Mask() function to get an image of just the class ID's back (not color). It will re-scale pointwise to whatever resolution the output image is. Output image should be uint8 to get class ID's back. This is how it is used in the code:

wakingking77 commented 3 years ago

@wakingking77 during --filter-mode=linear, bilinear filtering is used to blend the colors for visualization. So that is not very useful for detecting the underlying class, because they have been blended. That is why --filter-mode=point is more likely to be useful when you want to know the actual classes.

You can use the segNet.Mask() function to get an image of just the class ID's back (not color). It will re-scale pointwise to whatever resolution the output image is. Output image should be uint8 to get class ID's back. This is how it is used in the code:

thank you very very much!! I finally got it!!