Xilinx / Vitis-AI

Vitis AI is Xilinx’s development stack for AI inference on Xilinx hardware platforms, including both edge devices and Alveo cards.
https://www.xilinx.com/ai
Apache License 2.0
1.43k stars 622 forks source link

pytorch YOLOX OS Error #990

Open skimo1st opened 2 years ago

skimo1st commented 2 years ago

Hello,

I'm trying to get YOLOX (pt_yolox_TT100K_640_73G_2.5) to run in Vitis-AI 2.5 and re-train for a custom MPSoC board. I follow the instructions from the readme. The installation of YOLOX in the conda env works. I can also compile the pre-trained models for my board.

But when I call the run_train or run_demo scripts from the model directory I always get the following error.

Successfully built yolox
Installing collected packages: yolox
Successfully installed yolox-0.1.0
(vitis-ai-pytorch) Vitis-AI /workspace/model_zoo/pt_yolox_TT100K_640_640_73G_2.5/code > bash run_demo.sh 
float model demo , you can test the float model  or QAT converted model
Traceback (most recent call last):
  File "tools/demo_sign.py", line 34, in <module>
    from yolox.exp import get_exp
  File "/workspace/model_zoo/pt_yolox_TT100K_640_640_73G_2.5/code/yolox/exp/__init__.py", line 23, in <module>
    from .yolox_base_deploy_qat import ExpDeployQat, ExpKITTIDeployQat
  File "/workspace/model_zoo/pt_yolox_TT100K_640_640_73G_2.5/code/yolox/exp/yolox_base_deploy_qat.py", line 19, in <module>
    from pytorch_nndct import QatProcessor
  File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/__init__.py", line 14, in <module>
    from .apis import *
  File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/apis.py", line 25, in <module>
    from .qproc import TorchQuantProcessor
  File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/qproc/__init__.py", line 1, in <module>
    from .base import TorchQuantProcessor, dump_xmodel
  File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/qproc/base.py", line 30, in <module>
    from pytorch_nndct.quantization import TORCHQuantizer, FakeQuantizer
  File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/quantization/__init__.py", line 2, in <module>
    from .torch_qalgo import *
  File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/quantization/torch_qalgo.py", line 28, in <module>
    from pytorch_nndct.nn import fake_quantize_per_tensor, fake_quantize_per_channel
  File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/nn/__init__.py", line 1, in <module>
    from pytorch_nndct.nn.modules import functional
  File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/nn/modules/__init__.py", line 16, in <module>
    from .sigmoid import *
  File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/nn/modules/sigmoid.py", line 26, in <module>
    from .fix_ops import NndctSigmoidTableLookup, NndctSigmoidSimulation
  File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/nn/modules/fix_ops.py", line 26, in <module>
    from ..load_kernels import *
  File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/nn/load_kernels.py", line 31, in <module>
    torch.ops.load_library(lib_abspath)
  File "/home/vitis-ai-user/.local/lib/python3.7/site-packages/torch/_ops.py", line 105, in load_library
    ctypes.CDLL(path)
  File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/ctypes/__init__.py", line 364, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/nn/_kernels.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN3c106detail12infer_schema20make_function_schemaENS_8ArrayRefINS1_11ArgumentDefEEES4_

I tried it with the CPU as well as with the GPU Docker container and also tried to rebuild the images, unfortunately all without success.

Thank you.

wangxd-xlnx commented 1 year ago

Hi @skimo1st

Thanks for your feedback. We're trying to reproduce this issue and it seems related with version of docker and AI Quantizer. To solve this issue, may need use 'replace_pytorch.sh' to switch to specified environment.

We will fix this issue and update code & RAEDME.md for YOLOX. You will be notified when the update is complete.

Thanks!

wangxd-xlnx commented 1 year ago

Before package updated, you could follow these steps to switch environment as a temporary solution. We tried it and it works. Step1: Make sure it's latest Vitis-AI Step2: sh docker/dockerfiles/replace_pytorch.sh yolox-test Step3: conda activate yolox-test Step4: run train scripts

skimo1st commented 1 year ago

Hello @wangxd-xlnx,

thank you very much for your feedback. I have tested the workaround. Unfortunately the replace_pytorch.sh script fails on my system during bulding the pytorch_nndct-*.whl. I have attached the log file.

What I did today was to repull the Vitis repo and rebuild the GPU container. My test system consists of a Nividia RTX3060 and an AMD Ryzen 5900X. If necessary, detailed HW info can also be found in the log.

Thank you!

niuxjxlnx commented 1 year ago

@skimo1st :

The failure is caused by this package "cuda-toolkit-10-2" is not installed in your container. Can you try the command line in the container and see what happens?

  sudo apt update -y
  sudo apt-get install -y cuda-toolkit-10-2
skimo1st commented 1 year ago

I just found out i never replied. @niuxjxlnx thanks for your feedback.

The install command was not enough for me. I also had to add the nvidia cuda repository to my apt sources lists. This was apparently missing in my image. After that the installation worked.

While building the new conda environment I got another error.

/opt/vitis_ai/conda/envs/yolox-test/lib/python3.7/site-packages/torch/cuda/__init__.py:104: UserWarning: 
NVIDIA GeForce RTX 3060 Ti with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch installation supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37.
If you want to use the NVIDIA GeForce RTX 3060 Ti GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/.

It seems that new Nivida graphics cards of the 30 generation are not supported by the used cuda version. I am now using another PC with a 2080, where I was able to create the env and work with it.

Is that correct that these graphics cards are not supported or is there also a workaround for this?

Thank you!

Shreyas-NR commented 1 year ago

Hi @wangxd-xlnx @niuxjxlnx @skimo1st

I'm also working on the training yolox model in the vitis ai pytorch conda env using the deployable scripts. facing the similar issue


(vitis-ai-pytorch) Vitis-AI /workspace/code > python3 tools/train.py -f exps/example/yolox_voc/yolox_voc_s.py -d 8 -b 64 --fp16 -o -c weights/yolox_s.pth
2022-11-04 08:56:22.142 | INFO     | yolox.core.launch:_distributed_worker:130 - Rank 5 initialization finished.
2022-11-04 08:56:22.163 | INFO     | yolox.core.launch:_distributed_worker:130 - Rank 6 initialization finished.
2022-11-04 08:56:22.175 | INFO     | yolox.core.launch:_distributed_worker:130 - Rank 1 initialization finished.
2022-11-04 08:56:22.200 | INFO     | yolox.core.launch:_distributed_worker:130 - Rank 0 initialization finished.
2022-11-04 08:56:22.211 | INFO     | yolox.core.launch:_distributed_worker:130 - Rank 7 initialization finished.
2022-11-04 08:56:22.221 | INFO     | yolox.core.launch:_distributed_worker:130 - Rank 3 initialization finished.
2022-11-04 08:56:22.229 | INFO     | yolox.core.launch:_distributed_worker:130 - Rank 4 initialization finished.
2022-11-04 08:56:22.236 | INFO     | yolox.core.launch:_distributed_worker:130 - Rank 2 initialization finished.
2022-11-04 08:56:33.300 | INFO     | yolox.utils.setup_env:configure_omp:60 -
***************************************************************
We set `OMP_NUM_THREADS` for each process to 1 to speed up.
please further tune the variable for optimal performance.
***************************************************************
2022-11-04 08:56:33 | INFO     | yolox.core.trainer:140 - args: Namespace(batch_size=64, cache=False, ckpt='weights/yolox_s.pth', devices=8, dist_backend='nccl', dist_url=None, exp_file='exps/example/yolox_voc/yolox_voc_s.py', experiment_name='yolox_voc_s', fp16=True, machine_rank=0, name=None, num_machines=1, occupy=True, opts=[], resume=False, start_epoch=None)
2022-11-04 08:56:33 | INFO     | yolox.core.trainer:141 - exp value:
╒══════════════════╤════════════════════════════╕
│ keys             │ values                     │
╞══════════════════╪════════════════════════════╡
│ seed             │ None                       │
├──────────────────┼────────────────────────────┤
│ output_dir       │ './YOLOX_outputs'          │
├──────────────────┼────────────────────────────┤
│ print_interval   │ 10                         │
├──────────────────┼────────────────────────────┤
│ eval_interval    │ 10                         │
├──────────────────┼────────────────────────────┤
│ num_classes      │ 1                          │
├──────────────────┼────────────────────────────┤
│ depth            │ 0.33                       │
├──────────────────┼────────────────────────────┤
│ width            │ 0.5                        │
├──────────────────┼────────────────────────────┤
│ act              │ 'lrelu'                    │
├──────────────────┼────────────────────────────┤
│ data_num_workers │ 4                          │
├──────────────────┼────────────────────────────┤
│ input_size       │ (640, 640)                 │
├──────────────────┼────────────────────────────┤
│ multiscale_range │ 5                          │
├──────────────────┼────────────────────────────┤
│ data_dir         │ None                       │
├──────────────────┼────────────────────────────┤
│ train_ann        │ 'instances_train2017.json' │
├──────────────────┼────────────────────────────┤
│ val_ann          │ 'instances_val2017.json'   │
├──────────────────┼────────────────────────────┤
│ mosaic_prob      │ 1.0                        │
├──────────────────┼────────────────────────────┤
│ mixup_prob       │ 1.0                        │
├──────────────────┼────────────────────────────┤
│ hsv_prob         │ 1.0                        │
├──────────────────┼────────────────────────────┤
│ flip_prob        │ 0.5                        │
├──────────────────┼────────────────────────────┤
│ degrees          │ 10.0                       │
├──────────────────┼────────────────────────────┤
│ translate        │ 0.1                        │
├──────────────────┼────────────────────────────┤
│ mosaic_scale     │ (0.1, 2)                   │
├──────────────────┼────────────────────────────┤
│ mixup_scale      │ (0.5, 1.5)                 │
├──────────────────┼────────────────────────────┤
│ shear            │ 2.0                        │
├──────────────────┼────────────────────────────┤
│ enable_mixup     │ True                       │
├──────────────────┼────────────────────────────┤
│ warmup_epochs    │ 1                          │
├──────────────────┼────────────────────────────┤
│ max_epoch        │ 3                          │
├──────────────────┼────────────────────────────┤
│ warmup_lr        │ 0                          │
├──────────────────┼────────────────────────────┤
│ basic_lr_per_img │ 0.00015625                 │
├──────────────────┼────────────────────────────┤
│ scheduler        │ 'yoloxwarmcos'             │
├──────────────────┼────────────────────────────┤
│ no_aug_epochs    │ 15                         │
├──────────────────┼────────────────────────────┤
│ min_lr_ratio     │ 0.05                       │
├──────────────────┼────────────────────────────┤
│ ema              │ True                       │
├──────────────────┼────────────────────────────┤
│ weight_decay     │ 0.0005                     │
├──────────────────┼────────────────────────────┤
│ momentum         │ 0.9                        │
├──────────────────┼────────────────────────────┤
│ exp_name         │ 'yolox_voc_s'              │
├──────────────────┼────────────────────────────┤
│ test_size        │ (640, 640)                 │
├──────────────────┼────────────────────────────┤
│ test_conf        │ 0.01                       │
├──────────────────┼────────────────────────────┤
│ nmsthre          │ 0.65                       │
╘══════════════════╧════════════════════════════╛
2022-11-04 08:56:35 | ERROR    | yolox.core.launch:161 - An error has been caught in function '_distributed_worker', process 'SpawnProcess-1' (444), thread 'MainThread' (139887121491776):
Traceback (most recent call last):

  File "<string>", line 1, in <module>
  File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/multiprocessing/spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
               │     └ 9
               └ <function _main at 0x7f3a00dbb050>
  File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/multiprocessing/spawn.py", line 118, in _main
    return self._bootstrap()
           │    └ <function BaseProcess._bootstrap at 0x7f3a00e6f950>
           └ <SpawnProcess(SpawnProcess-1, started)>
  File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
    │    └ <function BaseProcess.run at 0x7f3a00e68f80>
    └ <SpawnProcess(SpawnProcess-1, started)>
  File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
    │    │        │    │        │    └ {}
    │    │        │    │        └ <SpawnProcess(SpawnProcess-1, started)>
    │    │        │    └ (<function _distributed_worker at 0x7f3895d43ef0>, 0, (<function main at 0x7f3a00cca050>, 8, 8, 0, 'nccl', 'tcp://127.0.0.1:3...
    │    │        └ <SpawnProcess(SpawnProcess-1, started)>
    │    └ <function _wrap at 0x7f38a31c27a0>
    └ <SpawnProcess(SpawnProcess-1, started)>
  File "/home/vitis-ai-user/.local/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
    fn(i, *args)
    │  │   └ (<function main at 0x7f3a00cca050>, 8, 8, 0, 'nccl', 'tcp://127.0.0.1:37325', (╒══════════════════╤══════════════════════════...
    │  └ 0
    └ <function _distributed_worker at 0x7f3895d43ef0>

> File "/workspace/code/yolox/core/launch.py", line 161, in _distributed_worker
    main_func(*args)
    │          └ (╒══════════════════╤════════════════════════════╕
    │            │ keys             │ values                     │
    │            ╞══════════════════╪════...
    └ <function main at 0x7f3a00cca050>

  File "/workspace/code/tools/train.py", line 124, in main
    trainer.train()
    │       └ <function Trainer.train at 0x7f3895cde050>
    └ <yolox.core.trainer.Trainer object at 0x7f38949c5e10>

  File "/workspace/code/yolox/core/trainer.py", line 84, in train
    self.before_train()
    │    └ <function Trainer.before_train at 0x7f38945c8680>
    └ <yolox.core.trainer.Trainer object at 0x7f38949c5e10>

  File "/workspace/code/yolox/core/trainer.py", line 145, in before_train
    model = self.exp.get_model()
            │    │   └ <function ExpDeploy.get_model at 0x7f387adc4710>
            │    └ ╒══════════════════╤════════════════════════════╕
            │      │ keys             │ values                     │
            │      ╞══════════════════╪═════...
            └ <yolox.core.trainer.Trainer object at 0x7f38949c5e10>

  File "/workspace/code/yolox/exp/yolox_base_deploy.py", line 362, in get_model
    from yolox.models.yolox_deploy import YOLOX

  File "/workspace/code/yolox/models/__init__.py", line 19, in <module>
    from .darknet import CSPDarknet, Darknet

  File "/workspace/code/yolox/models/darknet.py", line 21, in <module>
    from .network_blocks import BaseConv, CSPLayer, DWConv, Focus, ResLayer, SPPBottleneck

  File "/workspace/code/yolox/models/network_blocks.py", line 19, in <module>
    import pytorch_nndct.nn.modules.functional as QF

  File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/__init__.py", line 14, in <module>
    from .apis import *
  File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/apis.py", line 25, in <module>
    from .qproc import TorchQuantProcessor
  File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/qproc/__init__.py", line 1, in <module>
    from .base import TorchQuantProcessor, dump_xmodel
  File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/qproc/base.py", line 30, in <module>
    from pytorch_nndct.quantization import TORCHQuantizer, FakeQuantizer
  File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/quantization/__init__.py", line 2, in <module>
    from .torch_qalgo import *
  File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/quantization/torch_qalgo.py", line 28, in <module>
    from pytorch_nndct.nn import fake_quantize_per_tensor, fake_quantize_per_channel
  File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/nn/__init__.py", line 1, in <module>
    from pytorch_nndct.nn.modules import functional
  File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/nn/modules/__init__.py", line 16, in <module>
    from .sigmoid import *
  File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/nn/modules/sigmoid.py", line 26, in <module>
    from .fix_ops import NndctSigmoidTableLookup, NndctSigmoidSimulation
  File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/nn/modules/fix_ops.py", line 26, in <module>
    from ..load_kernels import *
  File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/nn/load_kernels.py", line 31, in <module>
    torch.ops.load_library(lib_abspath)
    │     │   │            └ '/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/nn/_kernels.cpython-37m-x86_64-linux-gnu...
    │     │   └ <function _Ops.load_library at 0x7f38a325da70>
    │     └ <module 'torch.ops' from '/home/vitis-ai-user/.local/lib/python3.7/site-packages/torch/_ops.py'>
    └ <module 'torch' from '/home/vitis-ai-user/.local/lib/python3.7/site-packages/torch/__init__.py'>
  File "/home/vitis-ai-user/.local/lib/python3.7/site-packages/torch/_ops.py", line 105, in load_library
    ctypes.CDLL(path)
    │      │    └ '/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/nn/_kernels.cpython-37m-x86_64-linux-gnu...
    │      └ <class 'ctypes.CDLL'>
    └ <module 'ctypes' from '/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/ctypes/__init__.py'>
  File "/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/ctypes/__init__.py", line 364, in __init__
    self._handle = _dlopen(self._name, mode)
    │    │         │       │    │      └ 0
    │    │         │       │    └ '/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/nn/_kernels.cpython-37m-x86_64-linux-gnu...
    │    │         │       └ <CDLL '/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/nn/_kernels.cpython-37m-x86_64-lin...
    │    │         └ <built-in function dlopen>
    │    └ 0
    └ <CDLL '/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/nn/_kernels.cpython-37m-x86_64-lin...

OSError: /opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/nn/_kernels.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN3c106detail12infer_schema20make_function_schemaENS_8ArrayRefINS1_11ArgumentDefEEES4_
(vitis-ai-pytorch) Vitis-AI /workspace/code >

thanks

Shreyas-NR commented 1 year ago

Hi all,

So I did some debugging for the error OSError: /opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/nn/_kernels.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN3c106detail12infer_schema20make_function_schemaENS_8ArrayRefINS1_11ArgumentDefEEES4_

Below are the debugging steps

(vitis-ai-pytorch) Vitis-AI /workspace/code > ldd /opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/nn/_kernels.cpython-37m-x86_64-linux-gnu.so

(vitis-ai-pytorch) Vitis-AI /workspace/code > ldd /opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/nn/_kernels.cpython-37m-x86_64-linux-                gnu.so
        linux-vdso.so.1 (0x00007fffd4dd9000)
        libc10.so => not found
        libtorch_cpu.so => not found
        libcudart.so.11.0 => /opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/nn/../../../../libcudart.so.11.0 (0x00007fcfd9b30000)
        libstdc++.so.6 => /opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/nn/../../../../libstdc++.so.6 (0x00007fcfd997c000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fcfd95de000)
        libgcc_s.so.1 => /opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/nn/../../../../libgcc_s.so.1 (0x00007fcfd9f73000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fcfd91ed000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fcfd9dcd000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fcfd8fe9000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fcfd8dca000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fcfd8bc2000)

I see that there are 2 unfound builds libc10.so => not found libtorch_cpu.so => not found

I tried to demangle the symbol ref. http://demangler.com/

_ZN3c106detail12infer_schema20make_function_schemaENS_8ArrayRefINS1_11ArgumentDefEEES4_

I got the function name as

c10::detail::infer_schema::make_function_schema(c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>, c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>)

So this is something related to the libc10


(vitis-ai-pytorch) Vitis-AI /workspace/code > echo $LD_LIBRARY_PATH
/opt/xilinx/xrt/lib:/usr/lib:/usr/lib/x86_64-linux-gnu:/home/vitis-ai-user/.local/lib/python3.7/site-packages/torch/lib

Searched for libc10


(vitis-ai-pytorch) Vitis-AI /workspace/code > find / -name '*libc10.so*' -ls
find: ‘/var/cache/ldconfig’: Permission denied
find: ‘/var/cache/apt/archives/partial’: Permission denied
find: ‘/etc/ssl/private’: Permission denied
    57045    652 -rwxrwxrwx   1 1000     1000       665984 Dec 11  2021 /opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/torch/lib/libc10.so
   246443    652 -rwxrwxr-x   2 1000     1000       665984 Dec 11  2021 /opt/vitis_ai/conda/envs/vitis-ai-optimizer_pytorch/lib/python3.7/site-packages/torch/lib/libc10.so
    19842    700 -rwxr-xr-x   1 vitis-ai-user vitis-ai-group   715552 Nov  4 19:32 /home/vitis-ai-user/.local/lib/python3.7/site-packages/torch/lib/libc10.so
find: ‘/root’: Permission denied
find: ‘/proc/tty/driver’: Permission denied

I exported one of the path as LD library path (vitis-ai-pytorch) Vitis-AI /workspace/code > export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/vitis-ai-user/.local/lib/python3.7/site-packages/torch/lib

Now when I load dynamic path (vitis-ai-pytorch) Vitis-AI /workspace/code > ldd /opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/nn/_kernels.cpython-37m-x86_64-linux-gnu.so libc10 is available now,


(vitis-ai-pytorch) Vitis-AI /workspace/code > ldd /opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/nn/_kernels.cpython-37m-x86_64-linux-gnu.so
        linux-vdso.so.1 (0x00007ffc4e127000)
        libc10.so => /home/vitis-ai-user/.local/lib/python3.7/site-packages/torch/lib/libc10.so (0x00007f5dbbea8000)
        libtorch_cpu.so => /home/vitis-ai-user/.local/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so (0x00007f5dab5ac000)
        libcudart.so.11.0 => /opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/nn/../../../../libcudart.so.11.0 (0x00007f5dab30f000)
        libstdc++.so.6 => /opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/nn/../../../../libstdc++.so.6 (0x00007f5dab15b000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f5daadbd000)
        libgcc_s.so.1 => /opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/nn/../../../../libgcc_s.so.1 (0x00007f5dbc2d9000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f5daa9cc000)
        libgomp-7c85b1e2.so.1 => /home/vitis-ai-user/.local/lib/python3.7/site-packages/torch/lib/libgomp-7c85b1e2.so.1 (0x00007f5daa7a2000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f5daa583000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f5dbc133000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f5daa37b000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f5daa177000)
        libcudart-80664282.so.10.2 => /home/vitis-ai-user/.local/lib/python3.7/site-packages/torch/lib/libcudart-80664282.so.10.2 (0x00007f5da9ef6000)

went ahead with the training script, Still the same error

I tried to grep the symbol in libc10.so ` (vitis-ai-pytorch) Vitis-AI /workspace/code > nm -o /home/vitis-ai-user/.local/lib/python3.7/site-packages/torch/lib/libc10.so|grep _ZN3c106detail

` Got this,


(vitis-ai-pytorch) Vitis-AI /workspace/code > nm -o /home/vitis-ai-user/.local/lib/python3.7/site-packages/torch/lib/libc10.so|grep _ZN3c106detail
/home/vitis-ai-user/.local/lib/python3.7/site-packages/torch/lib/libc10.so:000000000003f830 t _ZN3c106detail12_str_wrapperIJPKcRKlS3_EE4callERKS3_S5_S8_
/home/vitis-ai-user/.local/lib/python3.7/site-packages/torch/lib/libc10.so:0000000000042330 t _ZN3c106detail12_str_wrapperIJPKcRKlS3_S5_S3_S5_S3_EE4callERKS3_S5_S8_S5_S8_S5_S8_
/home/vitis-ai-user/.local/lib/python3.7/site-packages/torch/lib/libc10.so:000000000003fcd0 t _ZN3c106detail12_str_wrapperIJPKcRKmEE4callERKS3_S5_
/home/vitis-ai-user/.local/lib/python3.7/site-packages/torch/lib/libc10.so:0000000000023600 t _ZN3c106detail12_str_wrapperIJPKcRKmS3_RKiS3_RKPcS3_EE4callERKS3_S5_SD_S7_SD_SA_SD_
/home/vitis-ai-user/.local/lib/python3.7/site-packages/torch/lib/libc10.so:00000000000263c0 t _ZN3c106detail12_str_wrapperIJPKcRKN6caffe28TypeMetaES3_EE4callERKS3_S7_SA_
/home/vitis-ai-user/.local/lib/python3.7/site-packages/torch/lib/libc10.so:0000000000041000 t _ZN3c106detail12_str_wrapperIJPKcRKNS_12MemoryFormatEEE4callERKS3_S6_
/home/vitis-ai-user/.local/lib/python3.7/site-packages/torch/lib/libc10.so:0000000000029220 t _ZN3c106detail12_str_wrapperIJPKcRKsEE4callERKS3_S5_
/home/vitis-ai-user/.local/lib/python3.7/site-packages/torch/lib/libc10.so:000000000003a5e0 t _ZN3c106detail12_str_wrapperIJPKcRKsS3_EE4callERKS3_S5_S8_
/home/vitis-ai-user/.local/lib/python3.7/site-packages/torch/lib/libc10.so:0000000000029450 t _ZN3c106detail12_str_wrapperIJPKcRKSsEE4callERKS3_S5_
/home/vitis-ai-user/.local/lib/python3.7/site-packages/torch/lib/libc10.so:00000000000295b0 t _ZN3c106detail12_str_wrapperIJPKcRKSsS3_S5_S3_EE4callERKS3_S5_S8_S5_S8_
/home/vitis-ai-user/.local/lib/python3.7/site-packages/torch/lib/libc10.so:0000000000026910 t _ZN3c106detail12_str_wrapperIJRKSsEE4callES3_
/home/vitis-ai-user/.local/lib/python3.7/site-packages/torch/lib/libc10.so:00000000000550a0 T _ZN3c106detail13deleteNothingEPv
/home/vitis-ai-user/.local/lib/python3.7/site-packages/torch/lib/libc10.so:0000000000052730 T _ZN3c106detail13StripBasenameERKSs
/home/vitis-ai-user/.local/lib/python3.7/site-packages/torch/lib/libc10.so:0000000000050cc0 T _ZN3c106detail21LogAPIUsageFakeReturnERKSs
/home/vitis-ai-user/.local/lib/python3.7/site-packages/torch/lib/libc10.so:000000000003bb30 T _ZN3c106detail25getNonDeterministicRandomEb

Next I tried to grep the symbol in the other two paths where I found libc10

(vitis-ai-pytorch) Vitis-AI /workspace/code > find / -name '*libc10.so*' -ls
find: ‘/var/cache/ldconfig’: Permission denied
find: ‘/var/cache/apt/archives/partial’: Permission denied
find: ‘/etc/ssl/private’: Permission denied
    57045    652 -rwxrwxrwx   1 1000     1000       665984 Dec 11  2021 /opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/torch/lib/libc10.so
   246443    652 -rwxrwxr-x   2 1000     1000       665984 Dec 11  2021 /opt/vitis_ai/conda/envs/vitis-ai-optimizer_pytorch/lib/python3.7/site-packages/torch/lib/libc10.so
    19842    700 -rwxr-xr-x   1 vitis-ai-user vitis-ai-group   715552 Nov  4 19:32 /home/vitis-ai-user/.local/lib/python3.7/site-packages/torch/lib/libc10.so
find: ‘/root’: Permission denied
find: ‘/proc/tty/driver’: Permission denied

/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/torch/lib/libc10.so


(vitis-ai-pytorch) Vitis-AI /workspace/code > nm -o /opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/torch/lib/libc10.so|grep _ZN3c106detail
/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/torch/lib/libc10.so:0000000000027ce0 t _ZN3c106detail12_str_wrapperIJPKcRKlS3_EE4callERKS3_S5_S8_
/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/torch/lib/libc10.so:0000000000027260 t _ZN3c106detail12_str_wrapperIJPKcRKlS3_S5_S3_S5_S3_EE4callERKS3_S5_S8_S5_S8_S5_S8_
/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/torch/lib/libc10.so:000000000003c100 t _ZN3c106detail12_str_wrapperIJPKcRKmEE4callERKS3_S5_
/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/torch/lib/libc10.so:000000000001cde0 t _ZN3c106detail12_str_wrapperIJPKcRKmS3_RKiS3_RKPcS3_EE4callERKS3_S5_SD_S7_SD_SA_SD_
/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/torch/lib/libc10.so:0000000000051140 t _ZN3c106detail12_str_wrapperIJPKcRKN6caffe28TypeMetaES3_EE4callERKS3_S7_SA_
/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/torch/lib/libc10.so:00000000000222e0 t _ZN3c106detail12_str_wrapperIJPKcRKNS_10DeviceTypeES3_EE4callERKS3_S6_S9_
/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/torch/lib/libc10.so:0000000000028180 t _ZN3c106detail12_str_wrapperIJPKcRKS3_EE4callES5_S5_
/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/torch/lib/libc10.so:00000000000277c0 t _ZN3c106detail12_str_wrapperIJPKcRKS3_S3_EE4callES5_S5_S5_
/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/torch/lib/libc10.so:0000000000020840 t _ZN3c106detail12_str_wrapperIJPKcRKsS3_EE4callERKS3_S5_S8_
/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/torch/lib/libc10.so:000000000001fab0 t _ZN3c106detail12_str_wrapperIJPKcRKSsEE4callERKS3_S5_
/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/torch/lib/libc10.so:000000000001ff00 t _ZN3c106detail12_str_wrapperIJPKcRKSsS3_S5_S3_EE4callERKS3_S5_S8_S5_S8_
/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/torch/lib/libc10.so:0000000000050900 t _ZN3c106detail12_str_wrapperIJPKcS3_EE4callERKS3_S6_
/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/torch/lib/libc10.so:0000000000037e00 t _ZN3c106detail12_str_wrapperIJRKPKcRKSsEE4callES5_S7_
/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/torch/lib/libc10.so:0000000000038250 t _ZN3c106detail12_str_wrapperIJRKPKcS5_EE4callES5_S5_
/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/torch/lib/libc10.so:00000000000516a0 t _ZN3c106detail12_str_wrapperIJRKSsEE4callES3_
/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/torch/lib/libc10.so:000000000003c6c0 T _ZN3c106detail13deleteNothingEPv
/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/torch/lib/libc10.so:000000000003a8b0 T _ZN3c106detail13StripBasenameERKSs
/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/torch/lib/libc10.so:0000000000036630 T _ZN3c106detail14torchCheckFailEPKcS2_jRKSs
/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/torch/lib/libc10.so:0000000000036700 T _ZN3c106detail14torchCheckFailEPKcS2_jS2_
/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/torch/lib/libc10.so:000000000003a940 T _ZN3c106detail20ExcludeFileExtensionERKSs
/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/torch/lib/libc10.so:0000000000039100 T _ZN3c106detail21LogAPIUsageFakeReturnERKSs
/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/torch/lib/libc10.so:0000000000036b80 T _ZN3c106detail23torchInternalAssertFailEPKcS2_jS2_RKSs
/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/torch/lib/libc10.so:0000000000036bf0 T _ZN3c106detail23torchInternalAssertFailEPKcS2_jS2_S2_
/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/torch/lib/libc10.so:0000000000021b20 T _ZN3c106detail25getNonDeterministicRandomEb

/opt/vitis_ai/conda/envs/vitis-ai-optimizer_pytorch/lib/python3.7/site-packages/torch/lib/libc10.so


(vitis-ai-pytorch) Vitis-AI /workspace/code > nm -o /opt/vitis_ai/conda/envs/vitis-ai-optimizer_pytorch/lib/python3.7/site-packages/torch/lib/libc10.so|grep _ZN3c106detail
/opt/vitis_ai/conda/envs/vitis-ai-optimizer_pytorch/lib/python3.7/site-packages/torch/lib/libc10.so:0000000000027ce0 t _ZN3c106detail12_str_wrapperIJPKcRKlS3_EE4callERKS3_S5_S8_
/opt/vitis_ai/conda/envs/vitis-ai-optimizer_pytorch/lib/python3.7/site-packages/torch/lib/libc10.so:0000000000027260 t _ZN3c106detail12_str_wrapperIJPKcRKlS3_S5_S3_S5_S3_EE4callERKS3_S5_S8_S5_S8_S5_S8_
/opt/vitis_ai/conda/envs/vitis-ai-optimizer_pytorch/lib/python3.7/site-packages/torch/lib/libc10.so:000000000003c100 t _ZN3c106detail12_str_wrapperIJPKcRKmEE4callERKS3_S5_
/opt/vitis_ai/conda/envs/vitis-ai-optimizer_pytorch/lib/python3.7/site-packages/torch/lib/libc10.so:000000000001cde0 t _ZN3c106detail12_str_wrapperIJPKcRKmS3_RKiS3_RKPcS3_EE4callERKS3_S5_SD_S7_SD_SA_SD_
/opt/vitis_ai/conda/envs/vitis-ai-optimizer_pytorch/lib/python3.7/site-packages/torch/lib/libc10.so:0000000000051140 t _ZN3c106detail12_str_wrapperIJPKcRKN6caffe28TypeMetaES3_EE4callERKS3_S7_SA_
/opt/vitis_ai/conda/envs/vitis-ai-optimizer_pytorch/lib/python3.7/site-packages/torch/lib/libc10.so:00000000000222e0 t _ZN3c106detail12_str_wrapperIJPKcRKNS_10DeviceTypeES3_EE4callERKS3_S6_S9_
/opt/vitis_ai/conda/envs/vitis-ai-optimizer_pytorch/lib/python3.7/site-packages/torch/lib/libc10.so:0000000000028180 t _ZN3c106detail12_str_wrapperIJPKcRKS3_EE4callES5_S5_
/opt/vitis_ai/conda/envs/vitis-ai-optimizer_pytorch/lib/python3.7/site-packages/torch/lib/libc10.so:00000000000277c0 t _ZN3c106detail12_str_wrapperIJPKcRKS3_S3_EE4callES5_S5_S5_
/opt/vitis_ai/conda/envs/vitis-ai-optimizer_pytorch/lib/python3.7/site-packages/torch/lib/libc10.so:0000000000020840 t _ZN3c106detail12_str_wrapperIJPKcRKsS3_EE4callERKS3_S5_S8_
/opt/vitis_ai/conda/envs/vitis-ai-optimizer_pytorch/lib/python3.7/site-packages/torch/lib/libc10.so:000000000001fab0 t _ZN3c106detail12_str_wrapperIJPKcRKSsEE4callERKS3_S5_
/opt/vitis_ai/conda/envs/vitis-ai-optimizer_pytorch/lib/python3.7/site-packages/torch/lib/libc10.so:000000000001ff00 t _ZN3c106detail12_str_wrapperIJPKcRKSsS3_S5_S3_EE4callERKS3_S5_S8_S5_S8_
/opt/vitis_ai/conda/envs/vitis-ai-optimizer_pytorch/lib/python3.7/site-packages/torch/lib/libc10.so:0000000000050900 t _ZN3c106detail12_str_wrapperIJPKcS3_EE4callERKS3_S6_
/opt/vitis_ai/conda/envs/vitis-ai-optimizer_pytorch/lib/python3.7/site-packages/torch/lib/libc10.so:0000000000037e00 t _ZN3c106detail12_str_wrapperIJRKPKcRKSsEE4callES5_S7_
/opt/vitis_ai/conda/envs/vitis-ai-optimizer_pytorch/lib/python3.7/site-packages/torch/lib/libc10.so:0000000000038250 t _ZN3c106detail12_str_wrapperIJRKPKcS5_EE4callES5_S5_
/opt/vitis_ai/conda/envs/vitis-ai-optimizer_pytorch/lib/python3.7/site-packages/torch/lib/libc10.so:00000000000516a0 t _ZN3c106detail12_str_wrapperIJRKSsEE4callES3_
/opt/vitis_ai/conda/envs/vitis-ai-optimizer_pytorch/lib/python3.7/site-packages/torch/lib/libc10.so:000000000003c6c0 T _ZN3c106detail13deleteNothingEPv
/opt/vitis_ai/conda/envs/vitis-ai-optimizer_pytorch/lib/python3.7/site-packages/torch/lib/libc10.so:000000000003a8b0 T _ZN3c106detail13StripBasenameERKSs
/opt/vitis_ai/conda/envs/vitis-ai-optimizer_pytorch/lib/python3.7/site-packages/torch/lib/libc10.so:0000000000036630 T _ZN3c106detail14torchCheckFailEPKcS2_jRKSs
/opt/vitis_ai/conda/envs/vitis-ai-optimizer_pytorch/lib/python3.7/site-packages/torch/lib/libc10.so:0000000000036700 T _ZN3c106detail14torchCheckFailEPKcS2_jS2_
/opt/vitis_ai/conda/envs/vitis-ai-optimizer_pytorch/lib/python3.7/site-packages/torch/lib/libc10.so:000000000003a940 T _ZN3c106detail20ExcludeFileExtensionERKSs
/opt/vitis_ai/conda/envs/vitis-ai-optimizer_pytorch/lib/python3.7/site-packages/torch/lib/libc10.so:0000000000039100 T _ZN3c106detail21LogAPIUsageFakeReturnERKSs
/opt/vitis_ai/conda/envs/vitis-ai-optimizer_pytorch/lib/python3.7/site-packages/torch/lib/libc10.so:0000000000036b80 T _ZN3c106detail23torchInternalAssertFailEPKcS2_jS2_RKSs
/opt/vitis_ai/conda/envs/vitis-ai-optimizer_pytorch/lib/python3.7/site-packages/torch/lib/libc10.so:0000000000036bf0 T _ZN3c106detail23torchInternalAssertFailEPKcS2_jS2_S2_
/opt/vitis_ai/conda/envs/vitis-ai-optimizer_pytorch/lib/python3.7/site-packages/torch/lib/libc10.so:0000000000021b20 T _ZN3c106detail25getNonDeterministicRandomEb

Finally we can see that the symbol is undefined indicated by the letter U

(vitis-ai-pytorch) Vitis-AI /workspace/code > nm -o /opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/nn/_kernels.cpython-37m-x86_64-linux-gnu.so|grep _ZN3c106detail12
/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pytorch_nndct/nn/_kernels.cpython-37m-x86_64-linux-gnu.so:                 U _ZN3c106detail12infer_schema20make_function_schemaENS_8ArrayRefINS1_11ArgumentDefEEES4_

Looking for help!

Thanks