[r8.2] Can not compile ONNX models, segmentation fault occur

min0628 commented 2 years ago

Hi TI teams,

The Processor SDK for EdgeAI has been updated to 8.2. and there is a problem with the previously compiled model(works with SDK 8.1).

root@tda4vm-sk:/opt/edge_ai_apps/apps_cpp# ./bin/Release/app_edgeai -n -v ../configs/vadas.yaml 
libtidl_onnxrt_EP loaded 0x277b01e0 
Final number of subgraphs created are : 1, - Offloaded Nodes - 985, Total Nodes - 0 
Invoke  : ERROR: Unable to open network file /opt/model_zoo/posco/ssd_lite_mobilenet_v2_fpn_custom_512x512_onnx/artifacts/_tidl_net.bin 
[01:36:20.000.000000]:ERROR:[getPreprocessImageConfig:0557] Mean value specification missing.
[01:36:20.000.000074]:ERROR:[initialize:0845] getPreprocessImageConfig() failed.
[01:36:20.000.000391]:ERROR:[makePostprocessImageObj:0085] Invalid post-processing task type.
[01:36:20.000.000432]:ERROR:[createPostprocCntxt:0980] PostprocessImage::makePostprocessImageObj() failed.
[01:36:20.000.000450]:ERROR:[initialize:1217] createPostprocCntxt() failed.

So I'm trying to compile a model using new version(r8.2) of EdgeAI benchmark.

I used tutorial(tutorials/tutorial_detection.ipynb) the same as before in the compile. The default TFLite model compilation works well. However, try to compile ONNX model, segmentation fault occur. At first, I thought it was a problem with a our custom model, so I tried it with a TI ModelZoo's ONNX model, but it's the same.

Here is modified tutorial_detection.ipynb codes and logs.

preproc_transforms = preprocess.PreProcessTransforms(settings)
postproc_transforms = postprocess.PostProcessTransforms(settings)
onnx_session_cfg = sessions.get_onnx_session_cfg(settings, work_dir=work_dir)
onnx_session_type = settings.get_session_type(constants.MODEL_TYPE_ONNX)

pipeline_configs = {
    'od-8020':dict(
        task_type='detection',
        calibration_dataset=calib_dataset,
        input_dataset=val_dataset,
        preprocess=preproc_transforms.get_transform_onnx((512,512), (512,512), backend='cv2'),
        session=onnx_session_type(**onnx_session_cfg,
            runtime_options=utils.dict_update(settings.runtime_options_onnx_np2(), {'object_detection:meta_arch_type': 3, 'object_detection:meta_layers_names_list':f'{settings.models_path}/vision/detection/coco/edgeai-mmdet/ssd_mobilenetv2_lite_512x512_20201214_model.prototxt'}),
            model_path=f'{settings.models_path}/vision/detection/coco/edgeai-mmdet/ssd_mobilenetv2_lite_512x512_20201214_model.onnx'),
        postprocess=postproc_transforms.get_transform_detection_mmdet_onnx(squeeze_axis=None, normalized_detections=False, formatter=postprocess.DetectionBoxSL2BoxLS()),
        metric=dict(label_offset_pred=datasets.coco_det_label_offset_80to90(label_offset=1)),
        model_info=dict(metric_reference={'accuracy_ap[.5:.95]%':25.1})
    )
}

tools.run_accuracy(settings, work_dir, pipeline_configs)

configs to run: ['od-8020_onnxrt_coco_edgeai-mmdet_ssd_mobilenetv2_lite_512x512_20201214_model_onnx']
number of configs: 1
TASKS | 0%| || 0/1 [00:00<?, ?it/s]
TASKS                                                       |          |     0% 0/1| [< ]
INFO:20220427-042602: starting process on parallel_device - 0
INFO:20220427-042607: model_path - /edgeai-modelzoo/models/vision/detection/coco/edgeai-mmdet/ssd_mobilenetv2_lite_512x512_20201214_model.onnx
INFO:20220427-042607: model_file - /tmp/tmprirbgy8_/modelartifacts/8bits/od-8020_onnxrt_coco_edgeai-mmdet_ssd_mobilenetv2_lite_512x512_20201214_model_onnx/model/ssd_mobilenetv2_lite_512x512_20201214_model.onnx
Downloading 1/1: /edgeai-modelzoo/models/vision/detection/coco/edgeai-mmdet/ssd_mobilenetv2_lite_512x512_20201214_model.onnx
Download done for /edgeai-modelzoo/models/vision/detection/coco/edgeai-mmdet/ssd_mobilenetv2_lite_512x512_20201214_model.onnx
Converted model is valid!

INFO:20220427-042607: running - od-8020_onnxrt_coco_edgeai-mmdet_ssd_mobilenetv2_lite_512x512_20201214_model_onnx
INFO:20220427-042607: pipeline_config - {'task_type': 'detection', 'calibration_dataset': <jai_benchmark.datasets.coco_det.COCODetection object at 0x7f4b4dc240b8>, 'input_dataset': <jai_benchmark.datasets.coco_det.COCODetection object at 0x7f4b30b666d8>, 'preprocess': <jai_benchmark.preprocess.PreProcessTransforms object at 0x7f4b365cf400>, 'session': <jai_benchmark.sessions.onnxrt_session.ONNXRTSession object at 0x7f4b30b66da0>, 'postprocess': <jai_benchmark.postprocess.PostProcessTransforms object at 0x7f4b30b66e10>, 'metric': {'label_offset_pred': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9, 9: 10, 10: 11, 11: 13, 12: 14, 13: 15, 14: 16, 15: 17, 16: 18, 17: 19, 18: 20, 19: 21, 20: 22, 21: 23, 22: 24, 23: 25, 24: 27, 25: 28, 26: 31, 27: 32, 28: 33, 29: 34, 30: 35, 31: 36, 32: 37, 33: 38, 34: 39, 35: 40, 36: 41, 37: 42, 38: 43, 39: 44, 40: 46, 41: 47, 42: 48, 43: 49, 44: 50, 45: 51, 46: 52, 47: 53, 48: 54, 49: 55, 50: 56, 51: 57, 52: 58, 53: 59, 54: 60, 55: 61, 56: 62, 57: 63, 58: 64, 59: 65, 60: 67, 61: 70, 62: 72, 63: 73, 64: 74, 65: 75, 66: 76, 67: 77, 68: 78, 69: 79, 70: 80, 71: 81, 72: 82, 73: 84, 74: 85, 75: 86, 76: 87, 77: 88, 78: 89, 79: 90, 80: 91}}, 'model_info': {'metric_reference': {'accuracy_ap[.5:.95]%': 25.1}}}
INFO:20220427-042607: import  - od-8020_onnxrt_coco_edgeai-mmdet_ssd_mobilenetv2_lite_512x512_20201214_model_onnx

Segmentation fault: 11

Here is installed information about onnx.

(benchmark) root@d209ebbae3cf:/workspace# pip3 list | grep onnx
onnx                 1.8.1
onnxruntime-tidl     1.7.0

I don't know why this problem occurs.

mathmanu commented 2 years ago

There was a silly bug in setup.sh in the download of tidl_tools. If the tidl_tools.tar.gz was already present in the folder, the download would save it to a different file name, but the subsequent extract woudl use the previously present file.

Fixed it here: https://github.com/TexasInstruments/edgeai-benchmark/blob/master/setup.sh#L58

Please, pull the latest code, run setup.sh and then try.

min0628 commented 2 years ago

Thanks, It's working now.

TexasInstruments / edgeai-benchmark

[r8.2] Can not compile ONNX models, segmentation fault occur #6