dusty-nv / jetson-inference

Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.
https://developer.nvidia.com/embedded/twodaystoademo
MIT License
7.79k stars 2.98k forks source link

Segnet Datasets #67

Closed S4WRXTTCS closed 7 years ago

S4WRXTTCS commented 7 years ago

In the FCN-AlexNet-PASCAL-VOC directory there is an original.prototxt along with the trained model, and the other files created for this model.

But, the original.prototxt in there doesn't match up with the fcn_alexnet.prototxt that a person would normally use on the Pascal-VOC dataset as part of the semantic-segmentation example tutorial.

The key difference seems to be the fcn_alexnet.prototxt has a "group: 21" setting under the convolution_param for the upscore layer. This is missing on the original.prototxt and wasn't used to build the model.

I don't know what this setting does, but it seems to make a big difference. I'm also not able to run the segnet-console app with a model that has been trained with this setting. One reason for this is I can't train a model successfully unless I have this setting along with the changing the layer names to match what original.prototxt has. So any prototxt/model I test has both changes.

Now I know the segnet inference is likely still being worked on, but is there any limitation we should be aware of? The included examples seem to work fine.

dusty-nv commented 7 years ago

Hi, the issue was that 'group' setting was not available in TensorRT 1.0. The group allows for more than 21 classes to be represented (it derives from cityscapes which has ~30+ classes by default). Hence I removed it for the segmentation models used in this TensorRT-based tutorial.

dusty-nv commented 7 years ago

Also I remembered now that I totally remove the Deconvolution 'upscore' layer from my final segmentation networks, because it has learning-rate (LR) of zero (i.e. represents a simple linear transform) and runs slowly. Hence I just perform it myself as post-processing. After training in DIGITS, remove the Deconvolution and Crop layers near the end of the prototxt (I may have forgotten to remove Deconv from Pascal-VOC model). Also you change 'pad: 100' parameter of 'conv1' layer to 0 to get it to load in TensorRT.

S4WRXTTCS commented 7 years ago

The dataset I'm using has less than 21 classes so the group parameter isn't really needed. But, the problem is when I remove it I lose the ability to train the model. Where it won't work with the pre-trained FCN_Alexnet model that net_surgery.py in Digits 5.0 generates. Either it complains about the layer name conflict or it won't learn (accuracy stays at zero).

What model did you use as a pretrained model?

As to the deconvolution upscore layer, and the crap layer. I removed both of those after training. I don't recall this impacting anything. I started off with the CityScape which has both of those removed.

dusty-nv commented 7 years ago

Initially, when I was still using the deconv layer with TensorRT, I was using this devbranch of DIGITS as a workaround: https://github.com/gheinrich/DIGITS/tree/dev/groupless-deconv

However, now that I am removing deconv layer entirely when use with TensorRT, the special DIGITS is no longer required. You can train like normal. Then after the model is done training, remove the deconv layer from prototxt before loading with TensorRT. Hence the deconv group parameter is no longer relevant.

I just confirmed that my Pascal-VOC model still has the deconv layer listed, this must have been left over from before. I am removing it like the others and re-uploading Pascal model.

From: Jason Mecham [mailto:notifications@github.com] Sent: Tuesday, April 11, 2017 12:45 PM To: dusty-nv/jetson-inference Cc: Dustin Franklin; Comment Subject: Re: [dusty-nv/jetson-inference] Segnet Datasets (#67)

The dataset I'm using has less than 21 classes so the group parameter isn't really needed. But, the problem is when I remove it I lose the ability to train the model. Where it won't work with the pre-trained FCN_Alexnet model that net_surgery.py in Digits 5.0 generates. Either it complains about the layer name conflict or it won't learn (accuracy stays at zero).

What model did you use as a pretrained model?

As to the deconvolution upscore layer, and the crap layer. I removed both of those after training. I don't recall this impacting anything. I started off with the CityScape which has both of those removed.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/dusty-nv/jetson-inference/issues/67#issuecomment-293323956, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AOpDK0MDqZ4viBBtIG0FWSEzQZ40fOiKks5ru64MgaJpZM4M3UFz.


This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.

S4WRXTTCS commented 7 years ago

Thanks. That took care of that issue.

I'm able to successfully train the model in Digits 5.0 by leaving in the Group setting. The only thing I had to do to get it to work with segnet-console is to retrain using the layer names segnet expected (score_fr_21classes).

Once I trained the model I simply deleted the upscore layer, and the crop layer from the deploy.prototxt and changed the PAD parameter of conv1 to from 100 to 0 (gave a segmentation fault when I forgot this).

So it now seems to load successfully, but it's not giving me results like I expect. But, then I tried FCN-AlexNet-Pascal-Voc and it's not either.

I noticed that for the overlay colors file it's set to null, and there isn't a file like there is for the other ones like fcn-alexnet-aerial-fpv-720p-21ch that works fine. I did try a file, but it didn't give me anything that resembled the results I expected.

S4WRXTTCS commented 7 years ago

Here is what the output looks like using the AlexNet-Pascal-Voc model by default. test1

Here is the original file from the Semantic example. 2007_000392

S4WRXTTCS commented 7 years ago

Here is the what the output looks like using the latest AlexNet-Pascal-VOC that was updated a couple days ago. I noticed it added the color table file.
test1a