Xilinx / Vitis-AI

Vitis AI is Xilinx’s development stack for AI inference on Xilinx hardware platforms, including both edge devices and Alveo cards.
https://www.xilinx.com/ai
Apache License 2.0
1.49k stars 630 forks source link

Cannot build Docker Image(Type : GPU FrameWork : PyTorch) #1413

Open TeppeiT opened 8 months ago

TeppeiT commented 8 months ago

Hi. When I build Vitis AI 3.0 Docker Image(using command $ ./docker_build.sh -t gpu -f pytorch), errors like below

0.315 WARNING: cannot verify www.xilinx.com's certificate, issued by ‘C=SK,O=ESET\\, spol. s r. o.,CN=ESET SSL Filter CA’:
0.315   Unable to locally verify the issuer's authority.
0.315 HTTP request sent, awaiting response... 503 Service Unavailable
1.568 2024-03-11 02:51:35 ERROR 503: Service Unavailable.

It seems cannot access https://www.xilinx.com/bin/public/openDownload?filename=conda-channel-3.0.tar.gz

I tried accessing via web browzer, error occuerred and can not download file.

An error occurred while processing your request.
Reference #30.ad622c17.1710148651.1c0c3d7b

Can you please give me a solution to this? Thank you

vaibuild commented 8 months ago

Hi @TeppeiT, Thanks for letting us know, checking internally. will update later.

vaibuild commented 8 months ago

Hi @TeppeiT , I've got feedback that this issue our download site has been solve, sorry for the inconvenience.

TeppeiT commented 8 months ago

@vaibuild Thanks for your prompt reply ! I have successfully downloaded the file.

and try building Docker Image, but error occuerred. Error is sudo: conda: command not found

19.89 + export VAI_CONDA_CHANNEL=file:///scratch/conda-channel
19.89 + VAI_CONDA_CHANNEL=file:///scratch/conda-channel
19.89 + sudo chmod -R 777 /scratch/
19.93 + sudo ln -s /opt/conda /opt/vitis_ai/conda
19.93 + . /opt/vitis_ai/conda/etc/profile.d/conda.sh
19.93 ./install_torch.sh: line 16: /opt/vitis_ai/conda/etc/profile.d/conda.sh: No such file or directory
19.93 + true
19.93 + sudo conda config --env --append channels file:///scratch/conda-channel
19.94 sudo: conda: command not found
------
vitis-ai-cpu.Dockerfile:32
--------------------
  30 |     ADD conda/banner.sh /etc/
  31 |     ADD conda/${DOCKER_TYPE}_conda/bashrc /etc/bash.bashrc
  32 | >>> RUN if [[ -n "${TARGET_FRAMEWORK}" ]]; then  bash ./install_${TARGET_FRAMEWORK}.sh; fi
  33 |     USER root
  34 |     RUN mkdir -p ${VAI_ROOT}/conda/pkgs && chmod 777 ${VAI_ROOT}/conda/pkgs && ./install_vairuntime.sh && rm -fr ./*
--------------------
ERROR: failed to solve: process "/bin/bash -c if [[ -n \"${TARGET_FRAMEWORK}\" ]]; then  bash ./install_${TARGET_FRAMEWORK}.sh; fi" did not complete successfully: exit code: 1

I tried the following items, but the situation remains unchanged. ・docker build cache clear ・Change Python Version ・Update Vitis AI version(3.0 -> 3.5)

If there's a solution available, could you please provide it?

Sincerely

0o0x0o0 commented 8 months ago

I have the same problem as you, have you solved it yet?

427.3 + export VAI_CONDA_CHANNEL=file:///scratch/conda-channel
427.3 + VAI_CONDA_CHANNEL=file:///scratch/conda-channel
427.3 + sudo chmod -R 777 /scratch/
427.3 + sudo ln -s /opt/conda /opt/vitis_ai/conda
427.3 + . /opt/vitis_ai/conda/etc/profile.d/conda.sh
427.3 ./install_torch.sh: line 16: /opt/vitis_ai/conda/etc/profile.d/conda.sh: No such file or directory
427.3 + true
427.3 + sudo conda config --env --append channels file:///scratch/conda-channel
427.3 sudo: conda: command not found
------
vitis-ai-cpu.Dockerfile:32
--------------------
  30 |     ADD conda/banner.sh /etc/
  31 |     ADD conda/${DOCKER_TYPE}_conda/bashrc /etc/bash.bashrc
  32 | >>> RUN if [[ -n "${TARGET_FRAMEWORK}" ]]; then  bash ./install_${TARGET_FRAMEWORK}.sh; fi
  33 |     USER root
  34 |     RUN mkdir -p ${VAI_ROOT}/conda/pkgs && chmod 777 ${VAI_ROOT}/conda/pkgs && ./install_vairuntime.sh && rm -fr ./*
--------------------
ERROR: failed to solve: process "/bin/bash -c if [[ -n \"${TARGET_FRAMEWORK}\" ]]; then  bash ./install_${TARGET_FRAMEWORK}.sh; fi" did not complete successfully: exit code: 1
TeppeiT commented 8 months ago

Hi, @0o0x0o0 Thanks for sharing similar errors.

Not yet. I'm still waiting for a reply from @vaibuild

vaibuild commented 8 months ago

hi @0o0x0o0 , Could you please try with the latest commit from master branch? I cannot reproduce the issue you met. could you please remove the base image first to make it clean , base image is something started as xilinx/xxx-base.

0o0x0o0 commented 8 months ago

hi,@vaibuild , According to your suggestion, I remove the local image and try with the latest commit from master branch,I finally succeeded to build Docker Image, thank you very much for your suggestion.

TeppeiT commented 7 months ago

@vaibuild @0o0x0o0 Thanks ! I'll try later. I'm not in an environment where I can test it right now...

But,If I want to set up a Docker environment for Vitis AI 3.0, what should I do? I think it will become Vitis AI 3.5 when I commit to the master branch.

The reason I want to use Vitis AI 3.0 is to compile for the Zynq UltraScale+ MPSoC with DPU (DPUCZDX8G) support, and implementing the Vitis AI 3.0 Runtime environment on the device..

TeppeiT commented 6 months ago

@vaibuild

Sorry for my late reply.

I have tried the following, but the error is still occurring. ・Commit the latest Vitis AI GitHub repository ・Delete xilinx/vitis-ai-xx-gpu

I can't figure out how to fix the problem any further and would appreciate your advice.

112.2 + export VAI_CONDA_CHANNEL=file:///scratch/conda-channel
112.2 + VAI_CONDA_CHANNEL=file:///scratch/conda-channel
112.2 + sudo mkdir -p /opt/vitis_ai/compiler
112.2 + [[ gpu != \c\p\u ]]
112.2 + arch_type=_gpu
112.2 + conda_channel=file:///scratch/conda-channel
112.2 + [[ gpu == \r\o\c\m ]]
112.2 + tensorflow_ver='tensorflow==2.12 keras==2.12'
112.2 + [[ gpu == \c\p\u ]]
112.2 + [[ gpu == \r\o\c\m ]]
112.2 + . /opt/vitis_ai/conda/etc/profile.d/conda.sh
112.2 ./install_tf2.sh: line 81: /opt/vitis_ai/conda/etc/profile.d/conda.sh: No such file or directory
112.2 + true
112.2 + mamba env create -f /scratch/gpu_conda/vitis-ai-tensorflow2.yml
112.2 ./install_tf2.sh: line 87: mamba: command not found
------
vitis-ai-cpu.Dockerfile:32
--------------------
  30 |     ADD conda/banner.sh /etc/
  31 |     ADD conda/${DOCKER_TYPE}_conda/bashrc /etc/bash.bashrc
  32 | >>> RUN if [[ -n "${TARGET_FRAMEWORK}" ]]; then  bash ./install_${TARGET_FRAMEWORK}.sh; fi
  33 |     USER root
  34 |     RUN mkdir -p ${VAI_ROOT}/conda/pkgs && chmod 777 ${VAI_ROOT}/conda/pkgs && ./install_vairuntime.sh && rm -fr ./*
--------------------
ERROR: failed to solve: process "/bin/bash -c if [[ -n \"${TARGET_FRAMEWORK}\" ]]; then  bash ./install_${TARGET_FRAMEWORK}.sh; fi" did not complete successfully: exit code: 127
TeppeiT commented 3 months ago

Hi, @vaibuild

Forgive my additional comments.

Is it possible that different versions of the Host PC OS are the cause?

The PC that is causing this error is running Ubuntu 22.04, but I have confirmed that I can build on a PC running Ubuntu 18.04 without any problems.

I would be happy to hear back from you if you would be so kind.

C8Costa commented 3 months ago

Hi, I'm having a similar error when trying to build the gpu-tensorflow2 docker on Ubuntu 22.04. The PyTorch docker works. Could it be the OS version that is causing this error?

71.6 + pip install -r /scratch/pip_requirements.txt 971.8 Requirement already satisfied: setuptools in /opt/vitis_ai/conda/envs/vitis-ai-tensorflow2/lib/python3.8/site-packages (from -r /scratch/pip_requirements.txt (line 1)) (65.7.0) 972.0 Collecting ck (from -r /scratch/pip_requirements.txt (line 2)) 972.3 Downloading ck-2.6.3.tar.gz (1.0 MB) 972.5 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/1.0 MB 4.7 MB/s eta 0:00:00 972.6 Preparing metadata (setup.py): started 972.7 Preparing metadata (setup.py): finished with status 'error' 972.7 error: subprocess-exited-with-error 972.7 972.7 × python setup.py egg_info did not run successfully. 972.7 │ exit code: 1 972.7 ╰─> [1 lines of output] 972.7 ERROR: Can not execute setup.py since setuptools is not available in the build environment. 972.7 [end of output] 972.7 972.7 note: This error originates from a subprocess, and is likely not a problem with pip. 972.7 error: metadata-generation-failed 972.7 972.7 × Encountered error while generating package metadata. 972.7 ╰─> See above for output. 972.7 972.7 note: This is an issue with the package mentioned above, not pip. 972.7 hint: See above for details. vitis-ai-cpu.Dockerfile:32 30 | ADD conda/banner.sh /etc/ 31 | ADD conda/${DOCKER_TYPE}conda/bashrc /etc/bash.bashrc 32 | >>> RUN if [[ -n "${TARGET_FRAMEWORK}" ]]; then bash ./install${TARGET_FRAMEWORK}.sh; fi 33 | USER root 34 | RUN mkdir -p ${VAI_ROOT}/conda/pkgs && chmod 777 ${VAI_ROOT}/conda/pkgs && ./install_vairuntime.sh && rm -fr ./* ERROR: failed to solve: process "/bin/bash -c if [[ -n "${TARGETFRAMEWORK}" ]]; then bash ./install${TARGET_FRAMEWORK}.sh; fi" did not complete successfully: exit code: 1

fcwindpass commented 3 months ago

./install_tf2.sh: line 81: /opt/vitis_ai/conda/etc/profile.d/conda.sh: No such file or directory

I got the same error like above,I solved it by modify the "install_conda.sh:16"(in docker/common) ,the url download fail will cause the error. I build a small webserver and download Mambaforge-4.10.3-5-Linux-x86_64.sh into my own webserver, then change the url in install_conda.sh:16(url to my webserver is http://192.168.88.112:8080/Mambaforge-4.10.3-5-Linux-x86_64.sh,you may change it to your own link). Then delete the pre dockerimages and rebuild, the error disappeared!

This erro is cause by network enviroment, some file download fail. I replaced all the link that need to be downloaded to my own webserver's url.