Xilinx / Vitis-AI

Vitis AI is Xilinx’s development stack for AI inference on Xilinx hardware platforms, including both edge devices and Alveo cards.
https://www.xilinx.com/ai
Apache License 2.0
1.48k stars 628 forks source link

pt_OFA-rcan_DIV2K_360_640_45.7G_2.5 can't export xmodel #892

Closed watchara-knot closed 2 years ago

watchara-knot commented 2 years ago

https://github.com/Xilinx/Vitis-AI/blob/master/model_zoo/model-list/pt_OFA-rcan_DIV2K_360_640_45.7G_2.5/model.yaml

from that link in GPU part after the following readme in the step of

echo "dump xmodel..." MODE='test'

python test.py --dir_data ${Data} --model RCAN_wo_CA --scale ${S} --n_resgroups ${R} --n_feats ${F} --data_test Set5+Set14+B100+Urban100 --float_model_path ${Model} --quant_mode ${MODE} --fast_finetune --dump_xmodel

error like this

[VAIQ_NOTE]: =>Doing weights equalization...

[VAIQ_NOTE]: =>Quantizable module is generated.(quantize_result/Model.py)

[VAIQ_NOTE]: =>Get module with quantization. Warning!!! The parameter/activation is not quantized: Model::input_0 Traceback (most recent call last): File "test.py", line 179, in main() File "test.py", line 155, in main output = quant_model(input, 2) File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, *kwargs) File "quantize_result/Model.py", line 54, in forward output_module_0 = self.module_1(output_module_0) File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1120, in _call_impl result = forward_call(input, **kwargs) File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/nn/modules/conv.py", line 67, in forward device=self.bias.data.device)) RuntimeError: Could not infer dtype of NoneType

how to slove it?

wangxd-xlnx commented 2 years ago

Hi @watchara-knot,

Thanks for your feedback. I tried scripts including run_qat.sh following steps of readme and they work well. But it is worth mentioning that ‘pip install -r requirements.txt' in the readme need to be changed to ’pip install --user -r requirements.txt' to avoid warning and potential dependency package install failure. I'm not sure if this is related to your issue. We will fix it and update readme.

For this issue, could you provide some more details? I think what you used is run_qat.sh, is it?

watchara-knot commented 2 years ago

Hi @wangxd-xlnx you are correct run_qat.sh is work well, but if we run that one will not get .xmodel to run in target board.

Please help to check or let me know you also get .xmodel.

As error above I run run_quant.sh but in the code comment on #echo "dump xmodel..." after I uncomment and run not work

RCAN-S model

export CUDA_VISIBLE_DEVICES=1 Model=../float/ Data=../data/

S=2 #scale R=1 #the number of groups F=16 #output channel

Note: for this model, direct 8bit-quantization gets an accuracy that is not so satisfactory, so we use the fast-finetune trick

echo "Calibrating model quantization..." MODE='calib'

python test.py --dir_data ${Data} --model RCAN_wo_CA --scale ${S} --n_resgroups ${R} --n_feats ${F} --data_test Set5+Set14+B100+Urban100 --float_model_path ${Model} --quant_mode ${MODE} --fast_finetune --nndct_finetune_lr_factor 0.015

echo "Testing quantized model..." MODE='test'

python test.py --dir_data ${Data} --model RCAN_wo_CA --scale ${S} --n_resgroups ${R} --n_feats ${F} --data_test Set5+Set14+B100+Urban100 --float_model_path ${Model} --quant_mode ${MODE} --fast_finetune

echo "dump xmodel..."

MODE='test'

python test.py --dir_data ${Data} --model RCAN_wo_CA --scale ${S} --n_resgroups ${R} --n_feats ${F} --data_test Set5+Set14+B100+Urban100 --float_model_path ${Model} --quant_mode ${MODE} --fast_finetune --dump_xmodel

watchara-knot commented 2 years ago

@wangxd-xlnx That one, I try to rerun the code, error below, by in intall only pip install scikit-image==0.17.2 imageio==2.9.0

I will show docker version also error if use ’pip install --user -r requirements.txt' next post

Docker Image Version: 2.5.0.1260 (CPU) Vitis AI Git Hash: 502703c Build Date: 2022-06-12

For TensorFlow 1.15 Workflows do: conda activate vitis-ai-tensorflow For PyTorch Workflows do: conda activate vitis-ai-pytorch For TensorFlow 2.8 Workflows do: conda activate vitis-ai-tensorflow2 For WeGo Tensorflow 1.15 Workflows do: conda activate vitis-ai-wego-tf1 For WeGo Tensorflow 2.8 Workflows do: conda activate vitis-ai-wego-tf2 For WeGo Torch Workflows do: conda activate vitis-ai-wego-torch Vitis-AI /workspace > conda activate vitis-ai-pytorch (vitis-ai-pytorch) Vitis-AI /workspace > cd pt_OFA-rcan_DIV2K_360_640_45.7G_2.5/code/ (vitis-ai-pytorch) Vitis-AI /workspace/pt_OFA-rcan_DIV2K_360_640_45.7G_2.5/code > sh run_qat.sh Traceback (most recent call last): File "train_qat.py", line 17, in import utility File "/workspace/pt_OFA-rcan_DIV2K_360_640_45.7G_2.5/code/utility.py", line 13, in import imageio ModuleNotFoundError: No module named 'imageio' (vitis-ai-pytorch) Vitis-AI /workspace/pt_OFA-rcan_DIV2K_360_640_45.7G_2.5/code > pip install scikit-image==0.17.2 imageio==2.9.0 Collecting scikit-image==0.17.2 Downloading scikit_image-0.17.2-cp37-cp37m-manylinux1_x86_64.whl (12.5 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.5/12.5 MB 8.2 MB/s eta 0:00:00 Collecting imageio==2.9.0 Downloading imageio-2.9.0-py3-none-any.whl (3.3 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.3/3.3 MB 8.5 MB/s eta 0:00:00 Requirement already satisfied: numpy>=1.15.1 in /opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages (from scikit-image==0.17.2) (1.17.5) Requirement already satisfied: pillow!=7.1.0,!=7.1.1,>=4.3.0 in /opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages (from scikit-image==0.17.2) (8.1.0) Requirement already satisfied: matplotlib!=3.0.0,>=2.0.0 in /opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages (from scikit-image==0.17.2) (3.4.3) Collecting networkx>=2.0 Downloading networkx-2.6.3-py3-none-any.whl (1.9 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.9/1.9 MB 8.4 MB/s eta 0:00:00 Collecting tifffile>=2019.7.26 Downloading tifffile-2021.11.2-py3-none-any.whl (178 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 178.9/178.9 kB 7.0 MB/s eta 0:00:00 Collecting PyWavelets>=1.1.1 Downloading PyWavelets-1.3.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.4 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.4/6.4 MB 8.6 MB/s eta 0:00:00 Requirement already satisfied: scipy>=1.0.1 in /opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages (from scikit-image==0.17.2) (1.3.1) Requirement already satisfied: python-dateutil>=2.7 in /opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages (from matplotlib!=3.0.0,>=2.0.0->scikit-image==0.17.2) (2.8.2) Requirement already satisfied: cycler>=0.10 in /opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages (from matplotlib!=3.0.0,>=2.0.0->scikit-image==0.17.2) (0.11.0) Requirement already satisfied: pyparsing>=2.2.1 in /opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages (from matplotlib!=3.0.0,>=2.0.0->scikit-image==0.17.2) (3.0.9) Requirement already satisfied: kiwisolver>=1.0.1 in /opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages (from matplotlib!=3.0.0,>=2.0.0->scikit-image==0.17.2) (1.4.2) Requirement already satisfied: typing-extensions in /opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages (from kiwisolver>=1.0.1->matplotlib!=3.0.0,>=2.0.0->scikit-image==0.17.2) (4.2.0) Requirement already satisfied: six>=1.5 in /opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages (from python-dateutil>=2.7->matplotlib!=3.0.0,>=2.0.0->scikit-image==0.17.2) (1.16.0) Installing collected packages: tifffile, PyWavelets, networkx, imageio, scikit-image Successfully installed PyWavelets-1.3.0 imageio-2.9.0 networkx-2.6.3 scikit-image-0.17.2 tifffile-2021.11.2 (vitis-ai-pytorch) Vitis-AI /workspace/pt_OFA-rcan_DIV2K_360_640_45.7G_2.5/code > sh run_qat.sh No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'

[VAIQ_NOTE]: Loading NNDCT kernels... Making model...

[VAIQ_NOTE]: Quant config file is empty, use default quant configuration

[VAIQ_NOTE]: Quantization test process start up...

[VAIQ_NOTE]: =>Quant Module is in 'cpu'.

[VAIQ_NOTE]: =>Parsing Model...

[VAIQ_NOTE]: Start to trace model...

[VAIQ_NOTE]: Finish tracing.

[VAIQ_NOTE]: Processing ops... ██████████████████████████████████████████████████| 46/46 [00:00<00:00, 241.48it/s, OpInfo: name = return_0, type = Return]

[VAIQ_NOTE]: =>Quantizable module is generated.(../snapshot/OFA_RCAN_QAT/model/qat_result/test/Model.py)

[VAIQ_NOTE]: =>Get module with quantization.

Evaluation: 0%| | 0/5 [00:00<?, ?it/s] [VAIQ_WARN]: The shape of input (torch.Size([3, 256, 256])) should be the same with that of dummy input ([3, 48, 48]) 20%|█████████ | 1/5 [00:02<00:08, 2.16s/it] [VAIQ_WARN]: The shape of input (torch.Size([3, 144, 144])) should be the same with that of dummy input ([3, 48, 48]) 40%|██████████████████ | 2/5 [00:02<00:04, 1.34s/it] [VAIQ_WARN]: The shape of input (torch.Size([3, 128, 128])) should be the same with that of dummy input ([3, 48, 48]) 60%|███████████████████████████ | 3/5 [00:03<00:02, 1.08s/it] [VAIQ_WARN]: The shape of input (torch.Size([3, 140, 140])) should be the same with that of dummy input ([3, 48, 48]) 80%|████████████████████████████████████ | 4/5 [00:04<00:00, 1.14it/s] [VAIQ_WARN]: The shape of input (torch.Size([3, 172, 114])) should be the same with that of dummy input ([3, 48, 48]) 100%|█████████████████████████████████████████████| 5/5 [00:04<00:00, 1.03it/s] [Set5 x2] PSNR: 22.5584 (PSNR: 22.5584 SSIM: 0.7789) (Best: 22.5584 @epoch 0) Forward: 4.87s

Saving... Total: 4.87s

[VAIQ_NOTE]: Quant config file is empty, use default quant configuration

[VAIQ_NOTE]: Quantization test process start up...

[VAIQ_NOTE]: =>Quant Module is in 'cpu'.

[VAIQ_NOTE]: =>Parsing Model...

[VAIQ_NOTE]: Start to trace model...

[VAIQ_NOTE]: Finish tracing.

[VAIQ_NOTE]: Processing ops... ██████████████████████████████████████████████████| 46/46 [00:00<00:00, 777.00it/s, OpInfo: name = return_0, type = Return]

[VAIQ_NOTE]: =>Quantizable module is generated.(../snapshot/OFA_RCAN_QAT/model/qat_result/test/Model.py)

[VAIQ_NOTE]: =>Get module with quantization.

[VAIQ_NOTE]: =>Converting to xmodel ... Traceback (most recent call last): File "train_qat.py", line 118, in main() File "train_qat.py", line 115, in main quantizer.export_xmodel(output_dir) File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/apis.py", line 127, in export_xmodel self.processor.export_xmodel(output_dir, deploy_check) File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/qproc/base.py", line 248, in export_xmodel dump_xmodel(output_dir, deploy_check, self._lstm_app) File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/qproc/base.py", line 323, in dump_xmodel xmodel_depoly_infos, dump_deploy_infos = compiler.get_xmodel_and_dump_infos(quantizer, deploy_graphs) File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/nndct_shared/compile/xir_compiler.py", line 45, in get_xmodel_and_dump_infos graph_quant_info = XirCompiler.get_deloy_graph_infos(quantizer, deploy_graphs_list[0]) File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/nndct_shared/compile/xir_compiler.py", line 69, in get_deloy_graph_infos if len(quant_groups[blob_name]) == 1: KeyError: 'Model::Model/MeanShiftConv[model]/MeanShiftConv[sub_mean]/Conv2d[mean_conv]/input.2' (vitis-ai-pytorch) Vitis-AI /workspace/pt_OFA-rcan_DIV2K_360_640_45.7G_2.5/code >

watchara-knot commented 2 years ago

@wangxd-xlnx If I use ’pip install --user -r requirements.txt' error please help to suggest how to solve it and run to get .xmodel to run in target board

Docker Image Version: 2.5.0.1260 (CPU) Vitis AI Git Hash: 502703c Build Date: 2022-06-12

For TensorFlow 1.15 Workflows do: conda activate vitis-ai-tensorflow For PyTorch Workflows do: conda activate vitis-ai-pytorch For TensorFlow 2.8 Workflows do: conda activate vitis-ai-tensorflow2 For WeGo Tensorflow 1.15 Workflows do: conda activate vitis-ai-wego-tf1 For WeGo Tensorflow 2.8 Workflows do: conda activate vitis-ai-wego-tf2 For WeGo Torch Workflows do: conda activate vitis-ai-wego-torch Vitis-AI /workspace > conda activate vitis-ai-pytorch (vitis-ai-pytorch) Vitis-AI /workspace > cd pt_OFA-rcan_DIV2K_360_640_45.7G_2.5/ (vitis-ai-pytorch) Vitis-AI /workspace/pt_OFA-rcan_DIV2K_360_640_45.7G_2.5 > pip install --user -r requirements.txt Collecting certifi==2021.5.30 Downloading certifi-2021.5.30-py2.py3-none-any.whl (145 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 145.5/145.5 kB 1.0 MB/s eta 0:00:00 Requirement already satisfied: cycler==0.11.0 in /opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages (from -r requirements.txt (line 2)) (0.11.0) Collecting fonttools==4.29.1 Downloading fonttools-4.29.1-py3-none-any.whl (895 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 895.5/895.5 kB 4.9 MB/s eta 0:00:00 Collecting imageio==2.9.0 Downloading imageio-2.9.0-py3-none-any.whl (3.3 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.3/3.3 MB 6.3 MB/s eta 0:00:00 Requirement already satisfied: joblib==1.1.0 in /opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages (from -r requirements.txt (line 5)) (1.1.0) Collecting kiwisolver==1.3.2 Downloading kiwisolver-1.3.2-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.1 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 4.8 MB/s eta 0:00:00 Collecting matplotlib==3.5.1 Downloading matplotlib-3.5.1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (11.2 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.2/11.2 MB 4.8 MB/s eta 0:00:00 Collecting networkx==2.6.3 Downloading networkx-2.6.3-py3-none-any.whl (1.9 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.9/1.9 MB 5.2 MB/s eta 0:00:00 Collecting ninja==1.10.2.3 Downloading ninja-1.10.2.3-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl (108 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 108.1/108.1 kB 7.7 MB/s eta 0:00:00 Collecting numpy==1.17.2 Downloading numpy-1.17.2-cp37-cp37m-manylinux1_x86_64.whl (20.3 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 20.3/20.3 MB 5.8 MB/s eta 0:00:00 Requirement already satisfied: packaging==21.3 in /opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages (from -r requirements.txt (line 11)) (21.3) Collecting Pillow==9.0.1 Downloading Pillow-9.0.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.3 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.3/4.3 MB 6.2 MB/s eta 0:00:00 Collecting pyparsing==3.0.7 Downloading pyparsing-3.0.7-py3-none-any.whl (98 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 98.0/98.0 kB 8.5 MB/s eta 0:00:00 Requirement already satisfied: python-dateutil==2.8.2 in /opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages (from -r requirements.txt (line 14)) (2.8.2) Collecting PyWavelets==1.1.1 Downloading PyWavelets-1.1.1-cp37-cp37m-manylinux1_x86_64.whl (4.4 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.4/4.4 MB 6.7 MB/s eta 0:00:00 Collecting scikit-image==0.17.2 Downloading scikit_image-0.17.2-cp37-cp37m-manylinux1_x86_64.whl (12.5 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.5/12.5 MB 6.6 MB/s eta 0:00:00 Collecting scikit-learn==1.0.2 Downloading scikit_learn-1.0.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (24.8 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 24.8/24.8 MB 8.0 MB/s eta 0:00:00 Requirement already satisfied: scipy==1.3.1 in /opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages (from -r requirements.txt (line 18)) (1.3.1) Requirement already satisfied: six==1.16.0 in /opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages (from -r requirements.txt (line 19)) (1.16.0) Collecting sklearn==0.0 Downloading sklearn-0.0.tar.gz (1.1 kB) Preparing metadata (setup.py) ... done Requirement already satisfied: threadpoolctl==3.1.0 in /opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages (from -r requirements.txt (line 21)) (3.1.0) Collecting tifffile==2021.11.2 Downloading tifffile-2021.11.2-py3-none-any.whl (178 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 178.9/178.9 kB 3.9 MB/s eta 0:00:00 Collecting torch==1.7.1 Downloading torch-1.7.1-cp37-cp37m-manylinux1_x86_64.whl (776.8 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 776.8/776.8 MB 2.2 MB/s eta 0:00:00 Collecting tqdm==4.63.0 Downloading tqdm-4.63.0-py2.py3-none-any.whl (76 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 76.6/76.6 kB 5.0 MB/s eta 0:00:00 Collecting typing-extensions==4.1.1 Downloading typing_extensions-4.1.1-py3-none-any.whl (26 kB) Building wheels for collected packages: sklearn Building wheel for sklearn (setup.py) ... done Created wheel for sklearn: filename=sklearn-0.0-py2.py3-none-any.whl size=1310 sha256=42d64470131f94c2bd2367d4a88f61c5cbdf6952edadb49598016a240cf302ae Stored in directory: /home/vitis-ai-user/.cache/pip/wheels/46/ef/c3/157e41f5ee1372d1be90b09f74f82b10e391eaacca8f22d33e Successfully built sklearn Installing collected packages: ninja, certifi, typing-extensions, tqdm, pyparsing, Pillow, numpy, networkx, kiwisolver, fonttools, torch, tifffile, PyWavelets, imageio, scikit-learn, matplotlib, sklearn, scikit-image WARNING: The script ninja is installed in '/home/vitis-ai-user/.local/bin' which is not on PATH. Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. WARNING: The script tqdm is installed in '/home/vitis-ai-user/.local/bin' which is not on PATH. Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. WARNING: The scripts f2py, f2py3 and f2py3.7 are installed in '/home/vitis-ai-user/.local/bin' which is not on PATH. Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. WARNING: The scripts fonttools, pyftmerge, pyftsubset and ttx are installed in '/home/vitis-ai-user/.local/bin' which is not on PATH. Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. WARNING: The scripts convert-caffe2-to-onnx and convert-onnx-to-caffe2 are installed in '/home/vitis-ai-user/.local/bin' which is not on PATH. Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. WARNING: The scripts lsm2bin, tiff2fsspec, tiffcomment and tifffile are installed in '/home/vitis-ai-user/.local/bin' which is not on PATH. Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. WARNING: The scripts imageio_download_bin and imageio_remove_bin are installed in '/home/vitis-ai-user/.local/bin' which is not on PATH. Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. WARNING: The script skivi is installed in '/home/vitis-ai-user/.local/bin' which is not on PATH. Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. torchvision 0.11.2+cpu requires torch==1.10.1, but you have torch 1.7.1 which is incompatible. pandas 1.3.1 requires numpy>=1.17.3, but you have numpy 1.17.2 which is incompatible. Successfully installed Pillow-9.0.1 PyWavelets-1.1.1 certifi-2021.5.30 fonttools-4.29.1 imageio-2.9.0 kiwisolver-1.3.2 matplotlib-3.5.1 networkx-2.6.3 ninja-1.10.2.3 numpy-1.17.2 pyparsing-3.0.7 scikit-image-0.17.2 scikit-learn-1.0.2 sklearn-0.0 tifffile-2021.11.2 torch-1.7.1 tqdm-4.63.0 typing-extensions-4.1.1 (vitis-ai-pytorch) Vitis-AI /workspace/pt_OFA-rcan_DIV2K_360_640_45.7G_2.5 > cd code/ (vitis-ai-pytorch) Vitis-AI /workspace/pt_OFA-rcan_DIV2K_360_640_45.7G_2.5/code > sh run_qat.sh /home/vitis-ai-user/.local/lib/python3.7/site-packages/torch/cuda/init.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at /pytorch/c10/cuda/CUDAFunctions.cpp:100.) return torch._C._cuda_getDeviceCount() > 0 No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' Traceback (most recent call last): File "train_qat.py", line 24, in from pytorch_nndct import nn as nndct_nn File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/init.py", line 14, in from .apis import File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/apis.py", line 25, in from .qproc import TorchQuantProcessor File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/qproc/init.py", line 1, in from .base import TorchQuantProcessor, dump_xmodel File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/qproc/base.py", line 30, in from pytorch_nndct.quantization import TORCHQuantizer, FakeQuantizer File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/quantization/init.py", line 2, in from .torch_qalgo import File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/quantization/torch_qalgo.py", line 28, in from pytorch_nndct.nn import fake_quantize_per_tensor, fake_quantize_per_channel File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/nn/init.py", line 1, in from pytorch_nndct.nn.modules import functional File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/nn/modules/init.py", line 16, in from .sigmoid import File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/nn/modules/sigmoid.py", line 26, in from .fix_ops import NndctSigmoidTableLookup, NndctSigmoidSimulation File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/nn/modules/fix_ops.py", line 26, in from ..load_kernels import File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/nn/load_kernels.py", line 31, in torch.ops.load_library(lib_abspath) File "/home/vitis-ai-user/.local/lib/python3.7/site-packages/torch/_ops.py", line 105, in load_library ctypes.CDLL(path) File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/ctypes/init.py", line 364, in init self._handle = _dlopen(self._name, mode) OSError: /opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/nn/_kernels.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN3c106detail12infer_schema20make_function_schemaENS_8ArrayRefINS111ArgumentDefEEES4 (vitis-ai-pytorch) Vitis-AI /workspace/pt_OFA-rcan_DIV2K_360_640_45.7G_2.5/code >

watchara-knot commented 2 years ago

@wangxd-xlnx run_qat.sh i think I finish well in step 1 and 2 that error 2 post before from

OFA-RCAN model

Data=../data Model=../float/

S=2 #scale R=1 #the number of groups F=16 #output channel SAVE=OFA_RCAN_QAT

CUDA_VISIBLE_DEVICES=0 python train_qat.py --qat_step 1 --dir_data ${Data} --model RCAN_wo_CA_QAT --lr 5e-3 --epochs 250 --decay 50-100-150-200-250 --data_test Set5 --save ${SAVE} --scale 2 --n_resgroups ${R} --n_feats ${F} --batch_size 16 --n_GPUs 1 \

CUDA_VISIBLE_DEVICES=0 python train_qat.py --qat_step 2 --dir_data ${Data} --model RCAN_wo_CA_QAT --data_test Set5 --save ${SAVE} --scale 2 --n_resgroups ${R} --n_feats ${F} --batch_size 16 --n_GPUs 1 \

CUDA_VISIBLE_DEVICES=0 python train_qat.py --qat_step 3 --dir_data ${Data} --model RCAN_wo_CA_QAT --data_test Set5+Set14+B100+Urban100 --save ${SAVE} --scale 2 --n_resgroups ${R} --n_feats ${F} --batch_size 16 --n_GPUs 1 --test_only \

CUDA_VISIBLE_DEVICES=1 python train_qat.py --qat_step 1 --dir_data ${Data} --model RCAN_wo_CA_QAT --lr 5e-3 --epochs 2 --decay 50-100-150-200-250 --data_test Set5 --save ${SAVE} --scale 2 --n_resgroups ${R} --n_feats ${F} --batch_size 16 --n_GPUs 1 \

CUDA_VISIBLE_DEVICES=1 python train_qat.py --qat_step 2 --dir_data ${Data} --model RCAN_wo_CA_QAT --data_test Set5 --save ${SAVE} --scale 2 --n_resgroups ${R} --n_feats ${F} --batch_size 16 --n_GPUs 1 \

CUDA_VISIBLE_DEVICES=1 python train_qat.py --qat_step 3 --dir_data ${Data} --model RCAN_wo_CA_QAT --data_test Set5 --save ${SAVE} --scale 2 --n_resgroups ${R} --n_feats ${F} --batch_size 16 --n_GPUs 1 --test_only

niuxjxlnx commented 2 years ago

@watchara-knot :

To avoid key errors you met, it needs to keep all the following tools with the same version and don't change them: Vitis-AI, python, pytorch and vai_q_pytorch. Once version of one of them is changed, QAT/PTQ needs to be started from scratch (redo QAT training or PTQ calibration).

Another tip is: if pytorch version is changed, vai_q_pytorch needs to be re-compiled. Please refer to the script https://github.com/Xilinx/Vitis-AI/blob/master/docker/dockerfiles/replace_pytorch.sh.

qianglin-xlnx commented 2 years ago

Closing since no activity for more than 1 month, please reopen if you still have question, thanks