NVIDIA-AI-IOT / nvidia-tao

Other
82 stars 11 forks source link

Problem with running classification_tf1 gen_trt_engine #16

Open pazikk opened 9 months ago

pazikk commented 9 months ago

Hardware platform: GPU + docker

TAO version: 5.0

NVIDIA GPU Driver Version: image

Problem: classification_tf1 gen_trt_engine crashes for etlt vehicletype model.

Reproduce: run nvcr.io/nvidia/tao/tao-toolkit:5.0.0-deploy docker with:

docker container run --rm --net host --runtime=nvidia --gpus=1 -v /tmp/.X11-unix:/tmp/.X11-unix  -e DISPLAY=$DISPLAY -it nvcr.io/nvidia/tao/tao-toolkit:5.0.0-deploy bash

Download vehicletypenet etlt model using ngc:

wget https://ngc.nvidia.com/downloads/ngccli_cat_linux.zip
unzip -u -q ngccli_cat_linux.zip
ngc-cli/ngc registry model download-version nvidia/tao/vehicletypenet:pruned_v1.0.1 --dest samples/models/

Clone nvidia-tao repo with:

git clone https://github.com/NVIDIA-AI-IOT/nvidia-tao.git

Generate trt engine with:

classification_tf1 gen_trt_engine -m vehicletypenet_vpruned_v1.0.1/resnet18_vehicletypenet_pruned.etlt
 -k nvidia_tlt -e nvidia-tao/tao_deploy/specs/VehicleTypeNet/VehicleTypeNet.txt --data_type fp32 --batch_size 1 --max_batch_size 1 --batches 10 --engine_file models/vehicletypenet/vehicle_type.engine --results_dir vehicletypenet_vpruned_v1.0.1/

Program crashes with:

 Loading uff directly from the package source code
Loading uff directly from the package source code
2024-01-20 14:21:00,440 [TAO Toolkit] [INFO] root 174: Starting classification_tf1 gen_trt_engine.
2024-01-20 14:21:00,440 [TAO Toolkit] [INFO] root 61: The provided .etlt file is in UFF format.
2024-01-20 14:21:00,440 [TAO Toolkit] [INFO] root 62: Input name: b'input_1'
[01/20/2024-14:21:00] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 43, GPU 480 (MiB)
[01/20/2024-14:21:03] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +342, GPU +76, now: CPU 440, GPU 556 (MiB)
2024-01-20 14:21:03,279 [TAO Toolkit] [INFO] nvidia_tao_deploy.cv.classification_tf1.engine_builder 165: Parsing UFF model
[01/20/2024-14:21:03] [TRT] [W] The implicit batch dimension mode has been deprecated. Please create the network with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag whenever possible.
2024-01-20 14:21:03,289 [TAO Toolkit] [INFO] root 174: Error parsing message with type 'uff.MetaGraph'
Traceback (most recent call last):
  File "</usr/local/lib/python3.8/dist-packages/nvidia_tao_deploy/cv/classification_tf1/scripts/gen_trt_engine.py>", line 3, in <module>
  File "<frozen cv.classification_tf1.scripts.gen_trt_engine>", line 209, in <module>
  File "<frozen cv.common.decorators>", line 63, in _func
  File "<frozen cv.common.decorators>", line 48, in _func
  File "<frozen cv.classification_tf1.scripts.gen_trt_engine>", line 82, in main
  File "<frozen cv.classification_tf1.engine_builder>", line 173, in create_network
  File "<frozen cv.classification_tf1.engine_builder>", line 94, in get_uff_input_dims
google.protobuf.message.DecodeError: Error parsing message with type 'uff.MetaGraph'