Re-training on the Cat/Dog Dataset

FingerVonFrings commented 5 years ago

Hello

error when in Processing Images with TensorRT. Please see error below.

306lab:~/jetson-inference/python/training/imagenet$ imagenet-console.py --model=cat_dog/resnet18.onnx --input_blob=input_0 --output_blob=output_0 --labels=$DATASET/labels.txt $DATASET/test/cat/011.jpg wgoutput011.jpg

jetson.inference.init.py jetson.inference -- initializing Python 2.7 bindings... jetson.inference -- registering module types... jetson.inference -- done registering module types jetson.inference -- done Python 2.7 binding initialization jetson.utils.init.py jetson.utils -- initializing Python 2.7 bindings... jetson.utils -- registering module functions... jetson.utils -- done registering module functions jetson.utils -- registering module types... jetson.utils -- done registering module types jetson.utils -- done Python 2.7 binding initialization [image] loaded '/home/hfut/datasets/cat_dog/test/cat/011.jpg' (700 x 525, 3 channels) jetson.inference -- PyTensorNet_New() jetson.inference -- PyImageNet_Init() jetson.inference -- imageNet loading network using argv command line params jetson.inference -- imageNet.init() argv[0] = '--model=cat_dog/resnet18.onnx' jetson.inference -- imageNet.init() argv[1] = '--input_blob=input_0' jetson.inference -- imageNet.init() argv[2] = '--output_blob=output_0' jetson.inference -- imageNet.init() argv[3] = '--labels=/home/hfut/datasets/cat_dog/labels.txt'

imageNet -- loading classification network model from: -- prototxt (null) -- model cat_dog100/resnet18.onnx -- class_labels /home/hfut/datasets/cat_dog/labels.txt -- input_blob 'input_0' -- output_blob 'output_0' -- batch_size 1

[TRT] TensorRT version 5.0.6 [TRT] loading NVIDIA plugins... [TRT] completed loading NVIDIA plugins. [TRT] detected model format - ONNX (extension '.onnx') [TRT] desired precision specified for GPU: FASTEST [TRT] requested fasted precision for device GPU without providing valid calibrator, disabling INT8 [TRT] native precisions detected for GPU: FP32, FP16 [TRT] selecting fastest native precision for GPU: FP16 [TRT] attempting to open engine cache file cat_dog/resnet18.onnx.1.1.GPU.FP16.engine [TRT] cache file not found, profiling network model on device GPU [TRT] device GPU, loading /usr/bin/ cat_dog/resnet18.onnx

Input filename: cat_dog/resnet18.onnx ONNX IR version: 0.0.4 Opset version: 9 Producer name: pytorch Producer version: 1.1 Domain:
Model version: 0 Doc string:

WARNING: ONNX model has a newer ir_version (0.0.4) than this parser was built against (0.0.3). While parsing node number 69 [Gather -> "192"]: ERROR: /home/erisuser/p4sw/sw/gpgpu/MachineLearning/DIT/release/5.0/parsers/onnxOpenSource/ModelImporter.cpp:142 In function importNode: [8] No importer registered for op: Gather [TRT] failed to parse ONNX model 'cat_dog/resnet18.onnx' [TRT] device GPU, failed to load cat_dog/resnet18.onnx [TRT] failed to load cat_dog/resnet18.onnx [TRT] imageNet -- failed to initialize. jetson.inference -- imageNet failed to load built-in network 'googlenet' PyTensorNet_Dealloc() Traceback (most recent call last): File "/usr/local/bin/imagenet-console.py", line 53, in net = jetson.inference.imageNet(opt.network, argv) Exception: jetson.inference -- imageNet failed to load network jetson.utils -- freeing CUDA mapped memory`

but if i download this completed model that was trained for a full 100 epochs from here: then it's ok in Processing Images with TensorRT. I notice it will generate (resnet18.onnx.1.1.GPU.FP16.engine) file. but when i use my model ,this file unable to generate. Any help?Thanks

dusty-nv commented 5 years ago

Hi @FingerVonFrings, this issue was fixed by patching ResNet-18 model definition in my fork of torchvision with this commit https://github.com/dusty-nv/vision/commit/5c461366585df964503df4d05df00aea65deb0a9

So you may want to uninstall torchvision package, and re-install it from my fork:

$ sudo pip uninstall torchvision
$ python -c "import torchvision"   # should make error if succesfully uninstalled
$ git clone -bv0.3.0 https://github.com/dusty-nv/vision
$ vision
$ sudo python setup.py install

Then you should be able to train again. At first you can try training for just a couple epochs, then run onnx_export.py script and try imagenet-console again to make sure it works before doing more training.

FingerVonFrings commented 5 years ago

It does work.Thank you so much for your reply and advice!!!

duttasantanuGH commented 5 years ago

Hi dusty-nv First of all thanks for your comprehensive and well curated resource guide. I am facing following error while trying to install pytorch following your above instruction. Unfortunately, I am facing the following error: Kindly help me in resolving this.

dlinano@jetson-nano:~/sd/jetson-inference/build/vision$ sudo python setup.py install Traceback (most recent call last): File "setup.py", line 6, in from setuptools import setup, find_packages ImportError: No module named setuptools

Setuptools are already installed but I am getting this error.

Thanks Santanu

dusty-nv commented 5 years ago

Hi Santu, if you run an interactive python interpreter, are you able to import setuptools ok there?

From: duttasantanuGH notifications@github.com Sent: Tuesday, July 30, 2019 11:50:09 AM To: dusty-nv/jetson-inference jetson-inference@noreply.github.com Cc: Dustin Franklin dustinf@nvidia.com; Comment comment@noreply.github.com Subject: Re: [dusty-nv/jetson-inference] Re-training on the Cat/Dog Dataset (#370)

Hi dusty-nv First of all thanks for your comprehensive and well curated resource guide. I am facing following error while trying to install pytorch following your above instruction. Unfortunately, I am facing the following error: Kindly help me in resolving this.

dlinano@jetson-nano:~/sd/jetson-inference/build/vision$ sudo python setup.py install Traceback (most recent call last): File "setup.py", line 6, in from setuptools import setup, find_packages ImportError: No module named setuptools

Setuptools are already installed but I am getting this error.

Thanks Santanu

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/dusty-nv/jetson-inference/issues/370?email_source=notifications&email_token=ADVEGK4L4OMGU3OKYVEX4ZLQCBPLDA5CNFSM4IGBIWGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3ENRUQ#issuecomment-516479186, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ADVEGK2XO37QUHUSMKTJSMDQCBPLDANCNFSM4IGBIWGA.

This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.

duttasantanuGH commented 5 years ago

Yes it is working properly in interactive tool. I faced the same issue as mentioned in this thread and hence need to install again.

duttasantanuGH commented 5 years ago

Do you want me to install using interactive app? Previously i installed python3 version using interactive app. But faced the same issue as FingerVonFrings.

dusty-nv commented 5 years ago

Hmm is your python mapped to python3? That could be causing the error when it goes to install torchvision. You could try running those steps from the script manually if it helps.

From: duttasantanuGH notifications@github.com Sent: Tuesday, July 30, 2019 12:29:33 PM To: dusty-nv/jetson-inference jetson-inference@noreply.github.com Cc: Dustin Franklin dustinf@nvidia.com; Comment comment@noreply.github.com Subject: Re: [dusty-nv/jetson-inference] Re-training on the Cat/Dog Dataset (#370)

Do you want me to install using interactive app? Previously i installed python3 version using interactive app. But faced the same issue as FingerVonFrings.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/dusty-nv/jetson-inference/issues/370?email_source=notifications&email_token=ADVEGKYKUMMCFAFOVLEAYDTQCBT63A5CNFSM4IGBIWGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3ERJXQ#issuecomment-516494558, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ADVEGKZAVPGUJUKFCSRMW5TQCBT63ANCNFSM4IGBIWGA.

This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.

duttasantanuGH commented 5 years ago

Yes my python is mapped to python3. As mentioned earlier, in nteractive tool, I can see setuptools can be imported successfully. dlinano@jetson-nano:~/sd/jetson-inference/build/vision$ python Python 3.6.8 (default, Jan 14 2019, 11:02:34) [GCC 8.0.1 20180414 (experimental) [trunk revision 259383]] on linux Type "help", "copyright", "credits" or "license" for more information.

import setuptools print(setuptools.version) 41.0.1

But when I try to run the following scripts, I am getting error as mentioned before $ sudo pip uninstall torchvision $ python -c "import torchvision" # should make error if succesfully uninstalled $ git clone -bv0.3.0 https://github.com/dusty-nv/vision $ vision $ sudo python setup.py install

If I use your pytorch installer - pytorch get installed properly. I have done checking steps post installation to ensure it as suggested by you. I installed python3 compatible version of torchvision using interactive tool. But when retaining also takes place without any issues. I am being able to convert onnx file. But then at the "Processing Images with TensorRT" I am facing the same error as mentioned in this thread.

To overcome this issues, I tried to uninstall and install torchvision as suggested by you but at that stage facing this issue. Hope this clarifies.

Please help in resolving the issue. Thanks and Regards Santanu

duttasantanuGH commented 5 years ago

Hi dusty you are absolutely right that python mapping was not correct. I was mapping it for session but that was not effective for sudo... I have corrected it. But getting the following error: Can you please resolving it? Installed /usr/local/lib/python3.6/dist-packages/torchvision-0.3.0-py3.6-linux-aarch64.egg Processing dependencies for torchvision==0.3.0 Searching for torch>=1.1.0 Reading https://pypi.org/simple/torch/ No local packages or working download links found for torch>=1.1.0 error: Could not find suitable distribution for Requirement.parse('torch>=1.1.0')

Thanks Santanu

dusty-nv commented 5 years ago

My script uses 'python' for python 2 and 'python3' for python 3. So when you mapped your python to python3, torchvision is getting installed under python3 but torch is getting installed under python2.

Either run the steps manually and correct the usage pip/pip3 and python/python3 to match, or just select Python 3.6 version when running the script.

From: duttasantanuGH notifications@github.com Sent: Wednesday, July 31, 2019 12:34:57 PM To: dusty-nv/jetson-inference jetson-inference@noreply.github.com Cc: Dustin Franklin dustinf@nvidia.com; Comment comment@noreply.github.com Subject: Re: [dusty-nv/jetson-inference] Re-training on the Cat/Dog Dataset (#370)

Hi dusty you are absolutely right that python mapping was not correct. I was mapping it for session but that was not effective for sudo... I have corrected it. But getting the following error: Can you please resolving it? Installed /usr/local/lib/python3.6/dist-packages/torchvision-0.3.0-py3.6-linux-aarch64.egg Processing dependencies for torchvision==0.3.0 Searching for torch>=1.1.0 Reading https://pypi.org/simple/torch/ No local packages or working download links found for torch>=1.1.0 error: Could not find suitable distribution for Requirement.parse('torch>=1.1.0')

Thanks Santanu

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/dusty-nv/jetson-inference/issues/370?email_source=notifications&email_token=ADVEGK4MMII2TFBSHDD7TLLQCG5LDA5CNFSM4IGBIWGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3H2PKQ#issuecomment-516925354, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ADVEGK376Y653KKE3LC3K3LQCG5LDANCNFSM4IGBIWGA.

This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.

duttasantanuGH commented 5 years ago

Thank you Dusty for your kind advice. It worked like a charm yesterday. I ran it manually.

ghost commented 5 years ago

Hello Dusty. I faced the same problem. At first I used python3 to install torchvision and torch, after I failed, I tried to used python2 to re-train and run onnx_export.py script. everything worked fine until here but when I tried imagenet-console again. I still got the same error. Could you help me??

Here is the error: jetson.inference.init.py jetson.inference -- initializing Python 2.7 bindings... jetson.inference -- registering module types... jetson.inference -- done registering module types jetson.inference -- done Python 2.7 binding initialization jetson.utils.init.py jetson.utils -- initializing Python 2.7 bindings... jetson.utils -- registering module functions... jetson.utils -- done registering module functions jetson.utils -- registering module types... jetson.utils -- done registering module types jetson.utils -- done Python 2.7 binding initialization [image] loaded '/home/krsbi/datasets/cat_dog/test/dog/01.jpg' (500 x 375, 3 channels) jetson.inference -- PyTensorNet_New() jetson.inference -- PyImageNet_Init() jetson.inference -- imageNet loading network using argv command line params jetson.inference -- imageNet.init() argv[0] = '--model=cat_dog/resnet18.onnx' jetson.inference -- imageNet.init() argv[1] = '--input_blob=input_0' jetson.inference -- imageNet.init() argv[2] = '--output_blob=output_0' jetson.inference -- imageNet.init() argv[3] = '--labels=~/datasets/cat_dog/labels.txt'

imageNet -- loading classification network model from: -- prototxt (null) -- model cat_dog/resnet18.onnx -- class_labels ~/datasets/cat_dog/labels.txt -- input_blob 'input_0' -- output_blob 'output_0' -- batch_size 1

[TRT] TensorRT version 5.1.6 [TRT] loading NVIDIA plugins... [TRT] Plugin Creator registration succeeded - GridAnchor_TRT [TRT] Plugin Creator registration succeeded - NMS_TRT [TRT] Plugin Creator registration succeeded - Reorg_TRT [TRT] Plugin Creator registration succeeded - Region_TRT [TRT] Plugin Creator registration succeeded - Clip_TRT [TRT] Plugin Creator registration succeeded - LReLU_TRT [TRT] Plugin Creator registration succeeded - PriorBox_TRT [TRT] Plugin Creator registration succeeded - Normalize_TRT [TRT] Plugin Creator registration succeeded - RPROI_TRT [TRT] Plugin Creator registration succeeded - BatchedNMS_TRT [TRT] completed loading NVIDIA plugins. [TRT] detected model format - ONNX (extension '.onnx') [TRT] desired precision specified for GPU: FASTEST [TRT] requested fasted precision for device GPU without providing valid calibrator, disabling INT8 [TRT] native precisions detected for GPU: FP32, FP16 [TRT] selecting fastest native precision for GPU: FP16 [TRT] attempting to open engine cache file cat_dog/resnet18.onnx.1.1.GPU.FP16.engine [TRT] loading network profile from engine cache... cat_dog/resnet18.onnx.1.1.GPU.FP16.engine [TRT] device GPU, cat_dog/resnet18.onnx loaded [TRT] device GPU, CUDA engine context initialized with 2 bindings [TRT] binding -- index 0 -- name 'input_0' -- type FP32 -- in/out INPUT -- # dims 3 -- dim #0 3 (CHANNEL) -- dim #1 224 (SPATIAL) -- dim #2 224 (SPATIAL) [TRT] binding -- index 1 -- name 'output_0' -- type FP32 -- in/out OUTPUT -- # dims 1 [TRT] warning -- unknown nvinfer1::DimensionType (127) -- dim #0 2 (UNKNOWN) [TRT] binding to input 0 input_0 binding index: 0 [TRT] binding to input 0 input_0 dims (b=1 c=3 h=224 w=224) size=602112 [TRT] binding to output 0 output_0 binding index: 1 [TRT] binding to output 0 output_0 dims (b=1 c=2 h=1 w=1) size=8 device GPU, cat_dog/resnet18.onnx initialized. [TRT] cat_dog/resnet18.onnx loaded imageNet -- failed to find ~/datasets/cat_dog/labels.txt imageNet -- failed to load synset class descriptions (0 / 0 of 2) [TRT] imageNet -- failed to initialize. jetson.inference -- imageNet failed to load built-in network 'googlenet' PyTensorNet_Dealloc() Traceback (most recent call last): File "imagenet-console.py", line 53, in net = jetson.inference.imageNet(opt.network, argv) Exception: jetson.inference -- imageNet failed to load network jetson.utils -- freeing CUDA mapped memory

kirkchu commented 4 years ago

imageNet -- failed to find ~/datasets/cat_dog/labels.txt

replace "~" to "/home/your_name/datasets/cat_dog/labels.txt"

junjunjansent commented 4 years ago

@dusty-nv

I have a similar problem where it shows it has failed to parse ONNX model 'cat_dog/resnet18/onnx'. Similarly, if I download the trained 100 cat_dog epochs, there would be no issue.

I followed your instruction to uninstall torchvision package (using pip3 for myself), and re-install it from your fork "git clone -bv0.3.0 https://github.com/dusty-nv/vision" (previously torchvision version was '0.5.0a0+85b8fbf'), but it still did not work.

Hope you are able to advise as I am using this to train a different model as well. I am thinking whether this is due to an updated TensorRT version.

Thank you so much.

Following code provided for imagenet exactly: (https://github.com/dusty-nv/jetson-inference/blob/master/docs/pytorch-cat-dog.md)

~/jetson-inference/python/training/classification$ imagenet.py --model=cat_dog/resnet18.onnx --input_blob=input_0 --output_blob=output_0 --labels=$DATASET/labels.txt $DATASET/test/cat/01.jpg cat.jpg

Output:

jetson.inference -- imageNet loading network using argv command line params

imageNet -- loading classification network model from:
         -- prototxt     (null)
         -- model        cat_dog/resnet18.onnx
         -- class_labels /home/jansen/datasets/cat_dog/labels.txt
         -- input_blob   'input_0'
         -- output_blob  'output_0'
         -- batch_size   1

[TRT]    TensorRT version 6.0.1
[TRT]    loading NVIDIA plugins...
[TRT]    Plugin Creator registration succeeded - GridAnchor_TRT
[TRT]    Plugin Creator registration succeeded - GridAnchorRect_TRT
[TRT]    Plugin Creator registration succeeded - NMS_TRT
[TRT]    Plugin Creator registration succeeded - Reorg_TRT
[TRT]    Plugin Creator registration succeeded - Region_TRT
[TRT]    Plugin Creator registration succeeded - Clip_TRT
[TRT]    Plugin Creator registration succeeded - LReLU_TRT
[TRT]    Plugin Creator registration succeeded - PriorBox_TRT
[TRT]    Plugin Creator registration succeeded - Normalize_TRT
[TRT]    Plugin Creator registration succeeded - RPROI_TRT
[TRT]    Plugin Creator registration succeeded - BatchedNMS_TRT
[TRT]    Could not register plugin creator:  FlattenConcat_TRT in namespace: 
[TRT]    detected model format - ONNX  (extension '.onnx')
[TRT]    desired precision specified for GPU: FASTEST
[TRT]    requested fasted precision for device GPU without providing valid calibrator, disabling INT8
[TRT]    native precisions detected for GPU:  FP32, FP16
[TRT]    selecting fastest native precision for GPU:  FP16
[TRT]    attempting to open engine cache file cat_dog/resnet18.onnx.1.1.6001.GPU.FP16.engine
[TRT]    cache file not found, profiling network model on device GPU
[TRT]    device GPU, loading /usr/bin/ cat_dog/resnet18.onnx
----------------------------------------------------------------
Input filename:   cat_dog/resnet18.onnx
ONNX IR version:  0.0.4
Opset version:    9
Producer name:    pytorch
Producer version: 1.3
Domain:           
Model version:    0
Doc string:       
----------------------------------------------------------------
WARNING: ONNX model has a newer ir_version (0.0.4) than this parser was built against (0.0.3).
While parsing node number 0 [Conv -> "123"]:
--- Begin node ---
input: "input_0"
input: "0.conv1.weight"
output: "123"
op_type: "Conv"
attribute {
  name: "dilations"
  ints: 1
  ints: 1
  type: INTS
}
attribute {
  name: "group"
  i: 1
  type: INT
}
attribute {
  name: "kernel_shape"
  ints: 7
  ints: 7
  type: INTS
}
attribute {
  name: "pads"
  ints: 3
  ints: 3
  ints: 3
  ints: 3
  type: INTS
}
attribute {
  name: "strides"
  ints: 2
  ints: 2
  type: INTS
}

--- End node ---
ERROR: ModelImporter.cpp:296 In function importModel:
[5] Assertion failed: tensors.count(input_name)
[TRT]    failed to parse ONNX model 'cat_dog/resnet18.onnx'
[TRT]    device GPU, failed to load cat_dog/resnet18.onnx
[TRT]    failed to load cat_dog/resnet18.onnx
[TRT]    imageNet -- failed to initialize.
jetson.inference -- imageNet failed to load built-in network 'googlenet'
Traceback (most recent call last):
  File "/usr/local/bin/imagenet.py", line 55, in <module>
    net = jetson.inference.imageNet(opt.network, sys.argv)
Exception: jetson.inference -- imageNet failed to load network

dusty-nv commented 4 years ago

@officialjansent , which version of JetPack and PyTorch do you have installed?

If you upgrade to the latest JetPack, PyTorch 1.5, torchvision 0.7.0 (upstream torchvision, not my fork) you shouldn't have any problems. And on the latest versions you shouldn't need my torchvision fork.

junjunjansent commented 4 years ago

@dusty-nv Jetpack 4.3, Pytorch 1.4, torchvision (now 0.3.0)

@officialjansent , which version of JetPack and PyTorch do you have installed?

If you upgrade to the latest JetPack, PyTorch 1.5, torchvision 0.7.0 (upstream torchvision, not my fork) you shouldn't have any problems. And on the latest versions you shouldn't need my torchvision fork.

dusty-nv / jetson-inference