PRBonn / bonnet

Bonnet: An Open-Source Training and Deployment Framework for Semantic Segmentation in Robotics.
GNU General Public License v3.0
323 stars 89 forks source link

graph output of uff file for TensorRT #48

Open JadBatmobile opened 5 years ago

JadBatmobile commented 5 years ago

Hey Andres,

I believe i understand whats going on, but id like to be sure. the UFF file for deploying the model in tensorrt outputs the unnormalized logits layer from the neural network, as opposed to the mask?

tano297 commented 5 years ago

Correct!

JadBatmobile commented 5 years ago

Great, I noticed that you do not do a softmax on the logits, you just take the argmax immediately, which i realize is equivalent. I also see in the c++ deploy package, in netTRT.cpp, line 160: _sizeof_out = num_classes * _size_in_pix * sizeof(int);

i am curious why you used sizeof(int), instead of sizeof(float), since the output logits are float32 types. Now, i recognize that sizeof(int) is actually = to sizeof(float), but im curious why you explicitly stated it as int

Thank you!

JadBatmobile commented 5 years ago

Hey Andres, i have another question, this is unrelated and more theoretical, perhaps this is the wrong place, but id be down to email you instead.

When preprocessing images for segmentation learning with CNN's, there are multiple options, (1) take the mean for a pixel across the entire data set, and the std dev for the pixel across the data set, then subtract that mean from that pixel in each image and divide by the std dev. do this for all pixels. (2) for each image, find the mean value and std deviation of all your pixels, and normalize each pixel with those. Or (3) for each image, for each channel, find the mean value and deviation and normalize each pixel in each channel, then merge the channels.

I am wondering what your thoughts on those options are