ApolloAuto / apollo

An open autonomous driving platform
Apache License 2.0
25.06k stars 9.68k forks source link

在下载aarch64版本的docker镜像时候总是失败 #14593

Open QiTianDaShengDaShi opened 2 years ago

QiTianDaShengDaShi commented 2 years ago

We appreciate you go through Apollo documentations and search previous issues before creating an new one. If neither of the sources helped you with your issues, please report the issue using the following form. Please note missing info can delay the response time.

System information

Steps to reproduce the issue:

Supporting materials (screenshots, command lines, code/script snippets):

我在下载aarch64版本的docker镜像时候总是失败。

我的硬件: Xavier jetpack 4.5 ubuntu 18.04 apollo 6.0 运行如下命令, sudo bash docker/scripts/dev_start.sh

`Apollo_6/apollo$ sudo bash docker/scripts/dev_start.sh [INFO] Use default GeoLocation settings [INFO] Start pulling docker image apolloauto/apollo:dev-aarch64-18.04-20200915_0106 ... dev-aarch64-18.04-20200915_0106: Pulling from apolloauto/apollo Digest: sha256:26003e6be3cc6c322fa17b10915a597862fd1896c5ff03a6833492342a35a7cc Status: Image is up to date for apolloauto/apollo:dev-aarch64-18.04-20200915_0106 docker.io/apolloauto/apollo:dev-aarch64-18.04-20200915_0106 [INFO] Check and remove existing Apollo dev container ... [INFO] Determine whether host GPU is available ... [INFO] USE_GPU_HOST: 0 [WARNING] No CAN device named /dev/can0. [INFO] Starting mounting map volumes ... [INFO] Load map sunnyvale_big_loop from image: apolloauto/apollo:map_volume-sunnyvale_big_loop-aarch64-latest [INFO] Restart volume apollo_map_volume-sunnyvale_big_loop_root from image: apolloauto/apollo:map_volume-sunnyvale_big_loop-aarch64-latest [INFO] Start pulling docker image apolloauto/apollo:map_volume-sunnyvale_big_loop-aarch64-latest ... map_volume-sunnyvale_big_loop-aarch64-latest: Pulling from apolloauto/apollo Digest: sha256:275f66d9200b1e947ba27ae7e78ee0cdc455203d9f1e35e3f32b02e63fa5361e Status: Image is up to date for apolloauto/apollo:map_volume-sunnyvale_big_loop-aarch64-latest docker.io/apolloauto/apollo:map_volume-sunnyvale_big_loop-aarch64-latest e9eb005c8ec3b68ac90202fc0b3b50de2169666d761ebf3c1dbde5a44603d453 [INFO] Load map sunnyvale_loop from image: apolloauto/apollo:map_volume-sunnyvale_loop-aarch64-latest [INFO] Restart volume apollo_map_volume-sunnyvale_loop_root from image: apolloauto/apollo:map_volume-sunnyvale_loop-aarch64-latest [INFO] Start pulling docker image apolloauto/apollo:map_volume-sunnyvale_loop-aarch64-latest ... map_volume-sunnyvale_loop-aarch64-latest: Pulling from apolloauto/apollo Digest: sha256:fe449650dd5634b7c4ed29255d2537a41a34f7f2551cc1bb18013deeb70da447 Status: Image is up to date for apolloauto/apollo:map_volume-sunnyvale_loop-aarch64-latest docker.io/apolloauto/apollo:map_volume-sunnyvale_loop-aarch64-latest 1c7b38a54d47b7c3b763e6c72244426c826f3d4e8b08471ea4bd97ef9056c77d [INFO] Load map sunnyvale_with_two_offices from image: apolloauto/apollo:map_volume-sunnyvale_with_two_offices-aarch64-latest [INFO] Restart volume apollo_map_volume-sunnyvale_with_two_offices_root from image: apolloauto/apollo:map_volume-sunnyvale_with_two_offices-aarch64-latest [INFO] Start pulling docker image apolloauto/apollo:map_volume-sunnyvale_with_two_offices-aarch64-latest ... map_volume-sunnyvale_with_two_offices-aarch64-latest: Pulling from apolloauto/apollo Digest: sha256:13ae92aac07a2aeca873356cf525c26e0e316d642ad6f811cc15b4aa2946ebc7 Status: Image is up to date for apolloauto/apollo:map_volume-sunnyvale_with_two_offices-aarch64-latest docker.io/apolloauto/apollo:map_volume-sunnyvale_with_two_offices-aarch64-latest f76559c184a8e41e4f06af19c8014f1facf1c5274c8979d4169d30e787b73d34 [INFO] Load map san_mateo from image: apolloauto/apollo:map_volume-san_mateo-aarch64-latest [INFO] Restart volume apollo_map_volume-san_mateo_root from image: apolloauto/apollo:map_volume-san_mateo-aarch64-latest [INFO] Start pulling docker image apolloauto/apollo:map_volume-san_mateo-aarch64-latest ... map_volume-san_mateo-aarch64-latest: Pulling from apolloauto/apollo Digest: sha256:23de05457d602e35605ebbefba5f26f18b50c79502072f44af2b605a0c249d5a Status: Image is up to date for apolloauto/apollo:map_volume-san_mateo-aarch64-latest docker.io/apolloauto/apollo:map_volume-san_mateo-aarch64-latest ae8aeb94e5e0d68dfd47d5d78e928dfd3a5da18935ca093d77ad3e5a81de8207 [INFO] Mount other volumes ... [INFO] Restart volume apollo_localization_volume_root from image: apolloauto/apollo:localization_volume-aarch64-latest [INFO] Start pulling docker image apolloauto/apollo:localization_volume-aarch64-latest ... localization_volume-aarch64-latest: Pulling from apolloauto/apollo Digest: sha256:89d49ba98af5fac553753234ce96d68e33dba9dc1a3481a6c7df458822bc9097 Status: Image is up to date for apolloauto/apollo:localization_volume-aarch64-latest docker.io/apolloauto/apollo:localization_volume-aarch64-latest e5146d42a5956b7b1233ffc3d96ef0edaf761915e8a44aa03aecebe21e0e35c4 [INFO] Restart volume apollo_audio_volume_root from image: apolloauto/apollo:data_volume-audio_model-latest [INFO] Start pulling docker image apolloauto/apollo:data_volume-audio_model-latest ... data_volume-audio_model-latest: Pulling from apolloauto/apollo Digest: sha256:1fea613cb775ef3f1b649e3838e643bd3bd2c849983cfd51534d711011fb3cd2 Status: Image is up to date for apolloauto/apollo:data_volume-audio_model-latest docker.io/apolloauto/apollo:data_volume-audio_model-latest WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested 1c6011f655c752d0a65b9df17f74f8e0d84c4fe43aa760be750d886cd344c0c1 [INFO] Restart volume apollo_yolov4_volume_root from image: apolloauto/apollo:yolov4_volume-emergency_detection_model-latest [INFO] Start pulling docker image apolloauto/apollo:yolov4_volume-emergency_detection_model-latest ... yolov4_volume-emergency_detection_model-latest: Pulling from apolloauto/apollo Digest: sha256:45022ce4954a1839b6bcc892805629393e8046a9977a1e93dbc2e68c77d1cbca Status: Image is up to date for apolloauto/apollo:yolov4_volume-emergency_detection_model-latest docker.io/apolloauto/apollo:yolov4_volume-emergency_detection_model-latest WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested 12cd069318afb70f4c17817e11596a457a33fd93d05f4d6a67286f6bedfb4cd8 [INFO] Restart volume apollo_faster_rcnn_volume_root from image: apolloauto/apollo:faster_rcnn_volume-traffic_light_detection_model-latest [INFO] Start pulling docker image apolloauto/apollo:faster_rcnn_volume-traffic_light_detection_model-latest ... faster_rcnn_volume-traffic_light_detection_model-latest: Pulling from apolloauto/apollo Digest: sha256:c57fb57aba718ad948d27d98e50115c6955fbebf054e99c269aa4d1830384846 Status: Image is up to date for apolloauto/apollo:faster_rcnn_volume-traffic_light_detection_model-latest docker.io/apolloauto/apollo:faster_rcnn_volume-traffic_light_detection_model-latest WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested 91101c235702708fe0f228567b6939be86fb71f54ae673e98e4209d55868ec27 [INFO] Starting docker container "apollo_dev_root" ...

QiTianDaShengDaShi commented 2 years ago

我使用 sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu18.04 nvidia-smi

测试,也会出现 docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: detection error: malformed line 1, expected at least 2 tokens: unknown.

问题,我猜应该是Nvidia docker的问题,请问你们的开发者是如何解决这个问题的

wangtuo0820 commented 2 years ago

我也出现了同样的问题...根据log来看

docker.io/apolloauto/apollo:faster_rcnn_volume-traffic_light_detection_model-latest
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
docker.io/apolloauto/apollo:yolov4_volume-emergency_detection_model-latest
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
docker.io/apolloauto/apollo:data_volume-audio_model-latest
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested

貌似是官方没有给出faster_rcnn_volume-emergency_detection_modelyolov4_volume-emergency_detection_modeldata_volume-audio_model的aarch64版本镜像,dockerhub上面也没有找到。所以最后apollo_dev_root那个镜像没run起来....


看一下你的环境,执行sudo docker images是有上述容器的,但是sudo docker ps -a是没有相关镜像的


我的解决方案是把相关的volumes-from删掉了,同时也把--gpus all删掉了(related to 你第二个回复) i.e. 执行下面的命令

sudo docker run -it --privileged --name apollo_dev_root -e DISPLAY=:1 -e DOCKER_USER=root -e USER=root -e DOCKER_USER_ID=0 -e DOCKER_GRP=root -e DOCKER_GRP_ID=0 -e DOCKER_IMG=apolloauto/apollo:dev-aarch64-18.04-20200915_0106 -e USE_GPU_HOST=1 -e NVIDIA_VISIBLE_DEVICES=all -e NVIDIA_DRIVER_CAPABILITIES=compute,video,graphics,utility --volumes-from apollo_map_volume-sunnyvale_big_loop_root --volumes-from apollo_map_volume-sunnyvale_loop_root --volumes-from apollo_localization_volume_root  -v /home/nvidia/apollo:/apollo -v /dev:/dev -v /media:/media -v /tmp/.X11-unix:/tmp/.X11-unix:rw -v /etc/localtime:/etc/localtime:ro -v /usr/src:/usr/src -v /lib/modules:/lib/modules --net host -w /apollo --add-host in-dev-docker:127.0.0.1 --add-host ubuntu:127.0.0.1 --hostname in-dev-docker --shm-size 2G --pid=host -v /dev/null:/dev/raw1394 apolloauto/apollo:dev-aarch64-18.04-20200915_0106 /bin/bash

现在的情况是能编译部分模块,例如control


希望可以帮到你,也欢迎补充,thx

QiTianDaShengDaShi commented 2 years ago

是的,我也像你一样,继续去掉了 --volumes-from apollo_map_volume-sunnyvale_big_loop_root --volumes-from apollo_map_volume-sunnyvale_loop_root --volumes-from apollo_localization_volume_root 这样试验确实进入docker环境了,但是这样很多功能就缺失了吧,目前我想跑感知。 @wangtuo0820

QiTianDaShengDaShi commented 2 years ago

我使用如下命令进入docker sudo docker run -it --privileged --name apollo_dev_root -e DISPLAY=:0 -e DOCKER_USER=root -e USER=root -e DOCKER_USER_ID=0 -e DOCKER_GRP=root -e DOCKER_GRP_ID=0 -e DOCKER_IMG=apolloauto/apollo:dev-aarch64-18.04-20200915_0106 -e USE_GPU_HOST=1 -e NVIDIA_VISIBLE_DEVICES=all -e NVIDIA_DRIVER_CAPABILITIES=compute,video,graphics,utility -v /mnt/ssd/Apollo_6/apollo:/apollo -v /dev:/dev -v /media:/media -v /tmp/.X11-unix:/tmp/.X11-unix:rw -v /etc/localtime:/etc/localtime:ro -v /usr/src:/usr/src -v /lib/modules:/lib/modules --net host -w /apollo --add-host in-dev-docker:127.0.0.1 --add-host movex:127.0.0.1 --hostname in-dev-docker --shm-size 2G --pid=host -v /dev/null:/dev/raw1394 apolloauto/apollo:dev-aarch64-18.04-20200915_0106 /bin/bash

然后编译感知摄像头模块

root@in-dev-docker:/apollo# sudo bash apollo.sh build_cpu drivers/camera [INFO] Apollo Environment Settings: [INFO] APOLLO_ROOT_DIR: /apollo [INFO] APOLLO_CACHE_DIR: /apollo/.cache [INFO] APOLLO_IN_DOCKER: true [INFO] APOLLO_VERSION: HEAD-2020-09-21-e79f9d6765 [INFO] DOCKER_IMG: [INFO] APOLLO_ENV: STAGE=dev USE_ESD_CAN=false [INFO] USE_GPU: USE_GPU_HOST= USE_GPU_TARGET=0 [INFO] Configure .apollo.bazelrc in non-interactive mode Cannot find bazel. Please install bazel first. [ OK ] Successfully configured .apollo.bazelrc in non-interactive mode. [ OK ] Running CPU build on aarch64 platform. [INFO] Build Overview: [INFO] USE_GPU: 0 [ 0 for CPU, 1 for GPU ] [INFO] Bazel Options: --config=cpu [INFO] Build Targets: //modules/drivers/camera/... [INFO] Disabled: /apollo/scripts/apollo_build.sh: line 247: bazel: command not found root@in-dev-docker:/apollo#

报错找不到bazel,感觉这种方式虽然能进去,但是并不能使用。

wangtuo0820 commented 2 years ago

我在xavier中进去是可以找到bazel的,编译control模块成功了。你要跑感知模块那个部分呢?没找到GPU的话感知模块编译不过的呀,而且不知道什么原因Apollo6.0感知模块有很多单元测试都删掉了,建议跑Apollo5.5,5.5可以跑单测

xmuchen commented 1 year ago

on nvidia jetpack platform, if you want to use gpu in docker, you must use nvidia's l4t base image for jetpack plarform.you can't use nvidia/cuda:11.0.3-base-ubuntu18.04 as base image and start docker with option --gpus all,it will occured error. so the solution is replacing base image https://gitlab.com/nvidia/container-images/l4t-base .

daohu527 commented 1 year ago

We will give priority to supporting nvidia orin this year, will be released in June 2023 or later.

xmuchen commented 1 year ago

我使用 sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu18.04 nvidia-smi

测试,也会出现 docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: detection error: malformed line 1, expected at least 2 tokens: unknown.

问题,我猜应该是Nvidia docker的问题,请问你们的开发者是如何解决这个问题的

我使用 sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu18.04 nvidia-smi

测试,也会出现 docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: detection error: malformed line 1, expected at least 2 tokens: unknown.

问题,我猜应该是Nvidia docker的问题,请问你们的开发者是如何解决这个问题的

基础镜像要用英伟达L4T的在jetpack平台,在容器内才可以使用cuda