dusty-nv / jetson-inference

Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.
https://developer.nvidia.com/embedded/twodaystoademo
MIT License
7.88k stars 2.99k forks source link

Segnet can't load custom network #46

Closed Gminded closed 1 year ago

Gminded commented 7 years ago

I followed the semantic segmentation example for DIGITS 5 to train my own model. I tried to load it with segnet but I get this:

[GIE]  attempting to open cache file seg-voc/snapshot.caffemodel.tensorcache
[GIE]  cache file not found, profiling network model
[GIE]  platform has FP16 support.
[GIE]  loading seg-voc/deploy.prototxt seg-voc/snapshot.caffemodel
[libprotobuf FATAL ../../../externals/protobuf/aarch64/10.0/include/google/protobuf/repeated_field.h:1378] CHECK failed: (index) < (current_size_): 
terminate called after throwing an instance of 'google::protobuf::FatalException'
  what():  CHECK failed: (index) < (current_size_): 
Aborted

I think that this has something to do with the deploy file because I guess that is what the protobuf lib is for... However, my deploy.prototxt is well formed and almost identical to the ones in the segnet examples, except for input size and initial padding.

S4WRXTTCS commented 7 years ago

Did you ever resolve this error?

I'm getting the same error.

My model is very similar to FCN-Alexnet-AerialFPV-21ch-720P except for the image size. So I copied that deploy.prototxt over, and simply changed the input shape.

S4WRXTTCS commented 7 years ago

I got past this issue, but I haven't really solved the problem.

Here is what I did

I used the original.prototxt from the FCN-AlexNet-PASCAL-VOC from the Jetson-Inference, and I used that to train the model on my Digits workstation. It errored out on the fcn-alexnet.caffemodel pretrained model (layer name conflict) so I used the snapshot_iter_146400.caffemodel from Jetson-Inference.

Digits was able to train the model with what seemed like accuracy results close to what I had before, but in doing the single Image inference test on the Digits workstation I noticed the results were way different. It's all squarish versus polygons from before. So I didn't take it much further than to run it on the Jetson TX1 (runs without any error message, but the overlay is all messed up).

Looking into it further the main differences from the fcn_alexnet.prototxt, and the original.prototxt are

The score_fr layer is named differently The upscore layer is named differently The upscore convolution param has a group setting of 21 on fcn_alexnet.prototxt The upscore convolution param has a weight filler of "bilinear"

I have no idea which one of these was causing the problem. I would like to use the layer names that DIGITS 5.0 has in fcn_alexnet.prototxt, and the pretrained model built after this commit.

https://github.com/NVIDIA/DIGITS/commit/de81eab3d87345adbf1057d669821a8ff3246047