TexasInstruments / edgeai-tensorlab

Edge AI Model Development Tools
https://github.com/TexasInstruments/edgeai
Other
23 stars 5 forks source link

Export and compile yolox-nano-ti-lite model #10

Open AbdulghaniAltaweel opened 1 month ago

AbdulghaniAltaweel commented 1 month ago

I am trying to run the 8190-YOLOX-Nano-TI-Lite model using the edgeai_ai_apps framework, and I've encountered a compilation issue when exporting the model from the PyTorch format. Here are the steps I followed:

  1. Downloading the model directly from within the app on the hardware it runs successfully.

  2. On host pc I downloaded the model in ONNX format from https://github.com/TexasInstruments/edgeai-tensorlab/blob/r8.4/edgeai-modelzoo/models/vision/detection/coco/edgeai-yolox/yolox_nano_ti_lite_26p1_41p8.onnx.link and compiled it using edgeai-tidl-tools, it cab be compileed and it ran successfully on target only if I overwrite the generated param.yaml with original values from step 1.

  3. I tried to export the model from pytorch format https://github.com/TexasInstruments/edgeai-tensorlab/blob/r8.4/edgeai-modelzoo/models/vision/detection/coco/edgeai-yolox/yolox_nano_ti_lite_26p1_41p8_checkpoint.pth.link using edgeai-yolox. The export process completes without any errors. However, when I try to compile this exported model using edgeai-tidl-tools, I got the following error message:

Available execution providers : ['TIDLExecutionProvider', 'TIDLCompilationProvider', 'CPUExecutionProvider'] Running 1 Models - ['yolox-ti-nano-pretrained-selfexported'] Running_Model : yolox-ti-nano-pretrained-selfexported TIDL Meta PipeLine (Proto) File : ../../../models/public/pretrained-selfexported-yolox-ti-nano.prototxt yolox yolox Number of OD backbone nodes = 0 Size of odBackboneNodeIds = 0 Preliminary subgraphs created = 3 Final number of subgraphs created are : 1, - Offloaded Nodes - 196, Total Nodes - 252 Process Process-1: Traceback (most recent call last): File "/home/aaltaweel/anaconda3/envs/tidl-tools-py36/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/home/aaltaweel/anaconda3/envs/tidl-tools-py36/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(*self._args, **self._kwargs) File "onnxrt_ep.py", line 181, in run_model imgs, output, proc_time, sub_graph_time, height, width = infer_image(sess, input_images, config) File "onnxrt_ep.py", line 92, in infer_image input_data[:,ch,:,:] = ((input_data[:,ch,:,:]- mean) * scale) numpy.core._exceptions._UFuncNoLoopError: ufunc 'subtract' did not contain a loop with signature matching types (dtype('<U32'), dtype('<U32')) -> dtype('<U32')

I exported the model with: python tools/export_onnx.py --output-name my_mode.onnx -f exps/default/yolox_nano_ti_lite.py -c tools/yolox_nano_ti_lite_26p1_41p8_checkpoint.pth --export-det

When compiling the model I set the same config parameters for both cases 2 and 3: 'yolox-ti-nano-pretrained-selfexported' : { 'model_path' : os.path.join(models_base_path, 'pretrained-selfexported-yolox-ti-nano.onnx'), 'meta_layers_names_list' : os.path.join(models_base_path, 'pretrained-selfexported-yolox-ti-nano.prototxt'), 'num_images' : numImages, 'num_classes': 90, 'mean': 'null', 'std' : 'null', 'model_type': 'od', 'meta_arch_type' : 6, 'od_type' : 'YoloV5', 'framework' : '', 'session_name' : 'onnxrt', },

Thanks

mathmanu commented 1 month ago

Is there any difference between your exported model & prototxt vs what is in the modelzoo? If you are not noticing a difference, is it possible to attach them here.

AbdulghaniAltaweel commented 1 month ago

Thanks for response here are the two files.

This is the version generated by exporting the model manually name: "yolox" tidl_yolo { yolo_param { input: "/0/head/Concat_output_0" anchor_width: 8.0 anchor_height: 8.0 } yolo_param { input: "/0/head/Concat_3_output_0" anchor_width: 16.0 anchor_height: 16.0 } yolo_param { input: "/0/head/Concat_6_output_0" anchor_width: 32.0 anchor_height: 32.0 } detection_output_param { num_classes: 80 share_location: true background_label_id: -1 nms_param { nms_threshold: 0.65 top_k: 500 } code_type: CODE_TYPE_YOLO_X keep_top_k: 200 confidence_threshold: 0.01 } name: "yolox" in_width: 416 in_height: 416 output: "detections" }

And this is the original one name: "yolo_v3" tidl_yolo { name: "yolo_v3" in_width: 416 in_height: 416 yolo_param { input: "709" anchor_width: 8 anchor_height: 8 } yolo_param { input: "843" anchor_width: 16 anchor_height: 16 } yolo_param { input: "977" anchor_width: 32 anchor_height: 32 } detection_output_param { num_classes: 80 share_location: true background_label_id: -1 nms_param { nms_threshold: 0.65 top_k: 500 } code_type: CODE_TYPE_YOLO_X keep_top_k: 200 confidence_threshold: 0.01 } output: "detections" }

mathmanu commented 1 month ago

This is the error that you reported: File "onnxrt_ep.py", line 92, in infer_image input_data[:,ch,:,:] = ((input_data[:,ch,:,:]- mean) numpy.core._exceptions._UFuncNoLoopError: ufunc 'subtract' did not contain a loop with signature matching types (dtype('<U32'), dtype('<U32')) -> dtype('<U32')

This seems to be an error in the preprocessing in your python/numpy code and not inside the model

AbdulghaniAltaweel commented 1 month ago

Hello,

I did not do any code changes in model or in any tools.

AbdulghaniAltaweel commented 1 month ago

Above I was using the edgeai-tidl-tools versoin 08_02_00_05 now when comiling the model exported manually with version 09_02_06_00 I got another error message

Available execution providers : ['TIDLExecutionProvider', 'TIDLCompilationProvider', 'CPUExecutionProvider'] Running 1 Models - ['yolox-ti-nano-pretrained-selfexported'] Running_Model : yolox-ti-nano-pretrained-selfexported Running shape inference on model ../../../models/public/pretrained-selfexported-yolox-ti-nano.onnx yolox is meta arch name yolox Number of OD backbone nodes = 0 Size of odBackboneNodeIds = 0 free(): invalid pointer

AbdulghaniAltaweel commented 1 month ago

hello again , At step 2 mentioned above I compiled the original model from the zoo in onnx format using edgeai-tidl-tools version 08_02_00_05. Now if I use the version 09_02_06 I get the error message like when I want to compile the exported model by myself (in step 3)

Available execution providers : ['TIDLExecutionProvider', 'TIDLCompilationProvider', 'CPUExecutionProvider'] Running 1 Models - ['yolox-ti-nano-pretrained-preexported'] Running_Model : yolox-ti-nano-pretrained-preexported Running shape inference on model ../../../models/public/yolox_nano_ti_lite_26p1_41p8.onnx yolo_v3 is meta arch name yolo_v3 Number of OD backbone nodes = 190 Size of odBackboneNodeIds = 190 Preliminary subgraphs created = 1 Final number of subgraphs created are : 1, - Offloaded Nodes - 272, Total Nodes - 272 Process Process-1: Traceback (most recent call last): File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/home/aaltaweel/handguard_20/ti-edgeai-tidl-tools/examples/osrt_python/ort/onnxrt_ep.py", line 239, in run_model imgs, output, proc_time, sub_graph_time, height, width = infer_image(sess, input_images, config) File "/home/aaltaweel/handguard_20/ti-edgeai-tidl-tools/examples/osrt_python/ort/onnxrt_ep.py", line 122, in infer_image input_data[:,ch,:,:] = ((input_data[:,ch,:,:]- mean) * scale) numpy.core._exceptions._UFuncNoLoopError: ufunc 'subtract' did not contain a loop with signature matching types (dtype('float32'), dtype('<U1')) -> None

AbdulghaniAltaweel commented 1 month ago

Actually in the edgeai-tidl-tools there is no example for compiling any yolox-2D-Detection-Ti-lite-models The models from 8140 to 8190 in zoo. Could you please provide the config parameters to compile any one of those models. Thanks

mathmanu commented 1 month ago

This is some error in the python code: input_data[:,ch,:,:] = ((input_data[:,ch,:,:]- mean) * scale)

and says that the subtraction has incompatible types. Perhaps mean value is not the same type as input_data type. If so convert mean to the same type.

Also make sure the opt flag is False and following condition is not being used to avoid further confusion: if model_source['opt'] == True: in line https://github.com/TexasInstruments/edgeai-tidl-tools/blob/master/examples/osrt_python/common_utils.py#L202

mathmanu commented 1 month ago

Actually we are compiling the yolox models from edgeai-mmdetection now and not from edgeai-yolox In this folder you can see a _config.yaml file for the models that we have compiled and the compilation parameters are inside the that _config.yaml file For example: https://github.com/TexasInstruments/edgeai-tensorlab/tree/main/edgeai-modelzoo/models/vision/detection/coco/edgeai-mmdet

The parameters for the edgeai-yolox model should be quit similar to this.

AbdulghaniAltaweel commented 3 weeks ago

@mathmanu Hello,

I was able to successfully compile both models in case 2 and case 3 by setting the mean and std (or scale) to [0,0,0] and [1,1,1], respectively. Thank you for your help. I will close this issue as resolved.

However, I noticed that the model graphs for both cases 2 and 3 are not identical to the one in case 1. Additionally, the ONNX model (case 2) differs slightly from the PyTorch model (case 3), which I exported myself. These differences are in the graph, the output binary files, and the text files in output artifacts.

Could you please confirm if these three models are indeed different?

  1. ONR-OD-8190-yolox-nano-ti-lite-coco-416x416 from model zoo.
  2. onnx model
  3. pth checkpoint

thanks again

AbdulghaniAltaweel commented 3 weeks ago

Hello, The model (and also other models) could be successfully compiled with edgeai-tidl-tools (v08_02_00_05) but with the warning: WARNING: [TIDL_E_DATAFLOW_INFO_NULL] ti_cnnperfsim.out fails to allocate memory in MSMC. Please look into perfsim log. This model can only be used on PC emulation, it will get fault on target. and that is why I am not able to execute the model on device. How to fix this ?

Note: When I use another version of the tool v09.02.06.00 The model is compiled without the warning, but also can not be executed on the device, Because the OS on device is old, which require version 08_02_00_05 of the compilation tool.