Xilinx / Vitis-AI

Vitis AI is Xilinx’s development stack for AI inference on Xilinx hardware platforms, including both edge devices and Alveo cards.
https://www.xilinx.com/ai
Apache License 2.0
1.44k stars 626 forks source link

Compilation Error for Tensorflow2 quantized model #233

Closed metanav closed 2 years ago

metanav commented 3 years ago

I am getting the error below on compilation. I am using xilinx/vitis-ai-gpu:1.3 docker image. All the training and quantization are done in the same docker.

(vitis-ai-tensorflow2) Vitis-AI /workspace > vai_c_tensorflow2 -m ./quantized_model.h5 -a ULTRA96V2.json -o output -n model_name


metanav commented 3 years ago

I used another model and the previous error did not appear but I am getting another error now:

[INFO] Namespace(inputs_shape=None, layout='NHWC', model_files=['quantized_model.h5'], model_type='tensorflow2', out_filename='output/test_model_org.xmodel', proto=None) [INFO] tensorflow2 model: quantized_model.h5 /opt/vitis_ai/conda/envs/vitis-ai-tensorflow2/lib/python3.7/site-packages/xnnc/translator/tensorflow_translator.py:1752: H5pyDeprecationWarning: dataset.value has been deprecated. Use dataset[()] instead. value = param.get(group).get(ds).value [INFO] parse raw model : 2%|▉ | 2/109 [00:00<00:00, 5648.89it/s]
Traceback (most recent call last): File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow2/bin/xnnc-run", line 33, in sys.exit(load_entry_point('xnnc==1.3.0', 'console_scripts', 'xnnc-run')()) File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow2/lib/python3.7/site-packages/xnnc/main.py", line 194, in main normal_run(args) File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow2/lib/python3.7/site-packages/xnnc/main.py", line 178, in normal_run in_shapes=in_shapes if len(in_shapes) > 0 else None, File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow2/lib/python3.7/site-packages/xnnc/xconverter.py", line 131, in run xmodel = CORE.make_xmodel(model_files, model_type, _layout, in_shapes) File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow2/lib/python3.7/site-packages/xnnc/core.py", line 104, in make_xmodel model_files, layout, in_shapes=in_shapes, model_type=model_t File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow2/lib/python3.7/site-packages/xnnc/translator/tensorflow_translator.py", line 97, in to_xmodel model_name, raw_nodes, layout, in_shapes, model_fmt, model_type File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow2/lib/python3.7/site-packages/xnnc/translator/tensorflow_translator.py", line 163, in create_xmodel xmodel = cls.create_xmodel_from_tf2(name, layers, layout, in_shapes) File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow2/lib/python3.7/site-packages/xnnc/translator/tensorflow_translator.py", line 1529, in create_xmodel_from_tf2 f" Unsupported op: type: {op_type}, name: {wrapper_name}" ValueError: Unsupported op: type: BatchNormalization, name: quant_bn_data

I have used keras.backend.set_learning_phase(0) before quantization but it did not help.

metanav commented 3 years ago

OK it seems only those BatchNormlization layers are folded which are right after Conv layers. I had a BatchNorm layer after InputLayer that might be causing the issue. I used another model without that layer and it worked! But now I am seeing another error:

[INFO] Namespace(inputs_shape=None, layout='NHWC', model_files=['quantized_model.h5'], model_type='tensorflow2', out_filename='output/test_model_org.xmodel', proto=None) [INFO] tensorflow2 model: ../notebooks/experiments/069/quantized_model.h5 /opt/vitis_ai/conda/envs/vitis-ai-tensorflow2/lib/python3.7/site-packages/xnnc/translator/tensorflow_translator.py:1752: H5pyDeprecationWarning: dataset.value has been deprecated. Use dataset[()] instead. value = param.get(group).get(ds).value [INFO] parse raw model :100%|██████████████████████████████████████████████████| 135/135 [00:00<00:00, 24550.43it/s]
[INFO] infer shape (NHWC) :100%|██████████████████████████████████████████████████| 222/222 [00:00<00:00, 12077.77it/s]
[OPT] No optimization method available for xir-level optimization. [INFO] generate xmodel :100%|██████████████████████████████████████████████████| 222/222 [00:00<00:00, 4300.90it/s]
[INFO] generate xmodel: /workspace/ouput/test_model_org.xmodel [UNILOG][INFO] The compiler log will be dumped at "/tmp/vitis-ai-user/log/xcompiler-20201222-070210-21349" [UNILOG][INFO] Target architecture: DPUCZDX8G [UNILOG][FATAL][TARGET_FACTORY_UNREGISTERED_TARGET][Unregistered target!] Cannot find target with name DPUCZDX8G, valid names are: {DPUCAHX8H_ISA2=>0x20200000000002a,DPUCAHX8H_ISA2_ELP2=>0x20200000000002e,DPUCAHX8L_ISA0=>0x30000000000001d,DPUCVDX8G_ISA0_B16384C64B1=>0x600000076080812,DPUCVDX8G_ISA0_B8192C32B1=>0x600000076080811,DPUCVDX8G_ISA0_B8192C32B1_ELP4=>0x600000076040411,DPUCVDX8G_ISA0_B8192C32B3=>0x600000076080831,DPUCVDX8G_ISA0_B8192C32B3_DW=>0x6000000f6088831,DPUCVDX8G_ISA0_B8192C32B3_I4W8B2=>0x600000276080831,DPUCVDX8G_ISA0_B8192C32B3_I8W4B2=>0x600000376080831,DPUCVDX8G_ISA0_B8192C32B3_I8W8B2=>0x600000176080831,DPUCVDX8H_ISA0=>0x5000000000007ee,DPUCZDI4G_ISA0_B4096_DEMO_SSD=>0x400002003220206,DPUCZDI4G_ISA0_B8192D8_DEMO_SSD=>0x400002003220207,DPUCZDX8G_ISA0_B1024_MAX=>0x1000020f7014402,DPUCZDX8G_ISA0_B1024_MIN=>0x100002022010102,DPUCZDX8G_ISA0_B1152_MAX=>0x1000020f7012203,DPUCZDX8G_ISA0_B1152_MIN=>0x100002022010103,DPUCZDX8G_ISA0_B1600_MAX=>0x1000020f7014404,DPUCZDX8G_ISA0_B1600_MIN=>0x100002022010104,DPUCZDX8G_ISA0_B2304_MAX=>0x1000020f7014405,DPUCZDX8G_ISA0_B2304_MAX_BG2=>0x1000020f6014405,DPUCZDX8G_ISA0_B2304_MIN=>0x100002022010105,DPUCZDX8G_ISA0_B3136_MAX=>0x1000020f7014406,DPUCZDX8G_ISA0_B3136_MAX_BG2=>0x1000020f6014406,DPUCZDX8G_ISA0_B3136_MIN=>0x100002022010106,DPUCZDX8G_ISA0_B4096_MAX=>0x1000020f7014407,DPUCZDX8G_ISA0_B4096_MAX_BG2=>0x1000020f6014407,DPUCZDX8G_ISA0_B4096_MAX_EM=>0x1000030f7014407,DPUCZDX8G_ISA0_B4096_MIN=>0x100002022010107,DPUCZDX8G_ISA0_B512_MAX=>0x1000020f7012200,DPUCZDX8G_ISA0_B512_MIN=>0x100002022010100,DPUCZDX8G_ISA0_B800_MAX=>0x1000020f7012201,DPUCZDX8G_ISA0_B800_MIN=>0x100002022010101} Check failure stack trace: This program has crashed! Aborted (core dumped)

So what is the target name for Ultra96V2? I am using Ultra96V2.json from the previous version, is that the reason?

Hikaru-Furuta commented 3 years ago

I think the target name must be 'arch.json' which is at /opt/vitis_ai/compiler/arch./arch.json. Seach for arch.json, Don't you have a file like that? Though it is implemented using Python, my post may serve to solve your issue. https://github.com/Xilinx/Vitis-AI/issues/231

metanav commented 3 years ago

@Hikaru-Furuta Are you using Ultra96V2 or other board?

Hikaru-Furuta commented 3 years ago

I'm using Alveo-U250 and have overcome the same error, but still suffer another one

wanghong4compiler commented 2 years ago

Hi @metanav , have you solved this issue? I saw qianglin-xlnx has replied you on #240.

qianglin-xlnx commented 2 years ago

Hi @metanav Since we haven't received your reply for a long time, we assume you have solved this issue and I'm going to close it. If you still have any questions, please feel free to reopen it. Thank you very much.

thallesmm commented 6 months ago

Hello @qianglin-xlnx

I am having a similar problem I am using a ZCU104 with the DPUCZDX8G_ISA0_B4096_MAX_BG2 target, I changed the arch.json file as in https://support.xilinx.com/s/article/DPU-fingerprint-ERROR?language=en_US, from DPUCZDX8G_ISA1_B4096 to my actual target (DPUCZDX8G_ISA0_B4096_MAX_BG2) and now I am having this problem presented bellow. The DPU architecture used doesn't even appear in the registered targets. I am following the Vitis AI Tutorial: https://github.com/Xilinx/Vitis-AI-Tutorials/tree/1.4/Introduction/03-Basic/Module_4. Should I include this architecture in the registry manually? Or even change the DPU arch of the board to the one expected (DPUCZDX8G_ISA1_B4096)? How should I do that? I am new to the Xilinx tools, so I am a bit lost in it and I would appreciate if you could help me with that.

Screenshot from 2024-03-08 17-25-04