autowarefoundation / autoware

Autoware - the world's leading open-source software project for autonomous driving
https://www.autoware.org/
Apache License 2.0
8.59k stars 2.88k forks source link

Docker container not building due to missing S3 artifact #4544

Open stefanAMB opened 3 months ago

stefanAMB commented 3 months ago

Checklist

Description

I am trying to build autoware docker images but it consistently fails due to not being able to download an S3 artifact. Logs say:

221.7 TASK [autoware.dev_env.artifacts : Download yabloc_pose_initializer/resources.tar.gz] ***
222.3 fatal: [localhost]: FAILED! => {"changed": false, "dest": "/root/autoware_data/yabloc_pose_initializer/resources.tar.gz", "elapsed": 0, "msg": "Request failed: <urlopen error [Errno 101] Network is unreachable>", "url": "https://s3.ap-northeast-2.wasabisys.com/pinto-model-zoo/136_road-segmentation-adas-0001/resources.tar.gz"}

Trying to manually get the asset (wget or Chrome) also fails saying address unreachable. Further logs are below.

Expected behavior

I expect to be able to run ./docker/build.sh without any issues.

Actual behavior

The actual behaviour is given above. Task [autoware.dev_evn.artifacts] fails with:

#24 221.7 TASK [autoware.dev_env.artifacts : Download yabloc_pose_initializer/resources.tar.gz] ***                                                                                                                                                            
#24 222.3 fatal: [localhost]: FAILED! => {"changed": false, "dest": "/root/autoware_data/yabloc_pose_initializer/resources.tar.gz", "elapsed": 0, "msg": "Request failed: <urlopen error [Errno 101] Network is unreachable>", "url": "https://s3.ap-northeast-
2.wasabisys.com/pinto-model-zoo/136_road-segmentation-adas-0001/resources.tar.gz"}                                                                                                                                                                             
#24 222.3                                                                                                                                                                                                                                                      
#24 222.3 PLAY RECAP *********************************************************************                                                                                                                                                                     
#24 222.3 localhost                  : ok=9    changed=6    unreachable=0    failed=1    skipped=60   rescued=0    ignored=0                                                                                                                                   
#24 222.3                                                                                                                                                                                                                                                      
#24 222.4 Failed.                                                                                                                                                                                                                                              
#24 ERROR: process "/bin/bash -o pipefail -c ./setup-dev-env.sh -y --module all ${SETUP_ARGS} --download-artifacts --no-cuda-drivers --runtime openadk   && pip uninstall -y ansible ansible-core   && mkdir src   && vcs import src < autoware.repos   && rosd
ep update   && DEBIAN_FRONTEND=noninteractive rosdep install -y --dependency-types=exec --ignore-src --from-paths src --rosdistro \"$ROS_DISTRO\"   && apt-get autoremove -y && apt-get clean -y && rm -rf /var/lib/apt/lists/* \"$HOME\"/.cache   && find /usr
/lib/$LIB_DIR-linux-gnu -name \"*.a\" -type f -delete   && find / -name \"*.o\" -type f -delete   && find / -name \"*.h\" -type f -delete   && find / -name \"*.hpp\" -type f -delete   && rm -rf /autoware/src /autoware/ansible /autoware/autoware.repos     
/root/.local/pipx /opt/ros/\"$ROS_DISTRO\"/include /etc/apt/sources.list.d/cuda*.list     /etc/apt/sources.list.d/docker.list /etc/apt/sources.list.d/nvidia-docker.list     /usr/include /usr/share/doc /usr/lib/gcc /usr/lib/jvm /usr/lib/llvm*" did not comp
lete successfully: exit code: 1

Steps to reproduce

Assuming you cloned the repo do:

  1. git checkout main
  2. git fetch
  3. cd docker
  4. ./build.sh

Versions

No response

Possible causes

I assume the host s3.ap-northeast-2.wasabisys.com is simply offline.

Additional context

No response

oguzkaganozt commented 1 month ago

Can this still be reproducible ? I could not. @stefanAMB

stefanAMB commented 1 month ago

Hi @oguzkaganozt,

I am afraid it's till the same for me. The issue is still with an artifact download. Here are the logs.

#25 192.4 changed: [localhost]
#25 192.5 
#25 192.5 TASK [autoware.dev_env.artifacts : Download yabloc_pose_initializer/resources.tar.gz] ***
#25 198.1 fatal: [localhost]: FAILED! => {"changed": false, "dest": "/root/autoware_data/yabloc_pose_initializer/resources.tar.gz", "elapsed": 5, "msg": "Request failed: <urlopen error [Errno 101] Network is unreachable>", "url": "https://s3.ap-northeast-2.wasabisys.com/pinto-model-zoo/136_road-segmentation-adas-0001/resources.tar.gz"}
#25 198.1 
#25 198.1 PLAY RECAP *********************************************************************
#25 198.1 localhost                  : ok=45   changed=19   unreachable=0    failed=1    skipped=29   rescued=0    ignored=0   
#25 198.1 
#25 198.2 Failed.
#25 ERROR: process "/bin/bash -o pipefail -c ./setup-dev-env.sh -y --module all ${SETUP_ARGS} --download-artifacts --no-cuda-drivers --runtime openadk   && pip uninstall -y ansible ansible-core   && mkdir src   && vcs import src < autoware.repos   && rosdep update   && DEBIAN_FRONTEND=noninteractive rosdep install -y --dependency-types=exec --ignore-src --from-paths src --rosdistro \"$ROS_DISTRO\"   && apt-get autoremove -y && apt-get clean -y && rm -rf /var/lib/apt/lists/* \"$HOME\"/.cache   && find /usr/lib/$LIB_DIR-linux-gnu -name \"*.a\" -type f -delete   && find / -name \"*.o\" -type f -delete   && find / -name \"*.h\" -type f -delete   && find / -name \"*.hpp\" -type f -delete   && rm -rf /autoware/src /autoware/ansible /autoware/autoware.repos     /root/.local/pipx /opt/ros/\"$ROS_DISTRO\"/include /etc/apt/sources.list.d/cuda*.list     /etc/apt/sources.list.d/docker.list /etc/apt/sources.list.d/nvidia-docker.list     /usr/include /usr/share/doc /usr/lib/gcc /usr/lib/jvm /usr/lib/llvm*" did not complete successfully: exit code: 1

#24 [devel prebuilt 1/3] RUN --mount=type=ssh   ./setup-dev-env.sh -y --module all  --no-cuda-drivers openadk   && pip uninstall -y ansible ansible-core   && apt-get autoremove -y && apt-get clean -y && rm -rf /var/lib/apt/lists/* "$HOME"/.cache   && find / -name 'libcu*.a' -delete   && find / -name 'libnv*.a' -delete
------
 > [runtime runtime 2/7] RUN --mount=type=ssh   ./setup-dev-env.sh -y --module all  --download-artifacts --no-cuda-drivers --runtime openadk   && pip uninstall -y ansible ansible-core   && mkdir src   && vcs import src < autoware.repos   && rosdep update   && DEBIAN_FRONTEND=noninteractive rosdep install -y --dependency-types=exec --ignore-src --from-paths src --rosdistro "humble"   && apt-get autoremove -y && apt-get clean -y && rm -rf /var/lib/apt/lists/* "$HOME"/.cache   && find /usr/lib/x86_64-linux-gnu -name "*.a" -type f -delete   && find / -name "*.o" -type f -delete   && find / -name "*.h" -type f -delete   && find / -name "*.hpp" -type f -delete   && rm -rf /autoware/src /autoware/ansible /autoware/autoware.repos     /root/.local/pipx /opt/ros/"humble"/include /etc/apt/sources.list.d/cuda*.list     /etc/apt/sources.list.d/docker.list /etc/apt/sources.list.d/nvidia-docker.list     /usr/include /usr/share/doc /usr/lib/gcc /usr/lib/jvm /usr/lib/llvm*:
192.3 TASK [autoware.dev_env.artifacts : Create yabloc_pose_initializer directory inside /root/autoware_data] ***
192.4 changed: [localhost]
192.5 
192.5 TASK [autoware.dev_env.artifacts : Download yabloc_pose_initializer/resources.tar.gz] ***
198.1 fatal: [localhost]: FAILED! => {"changed": false, "dest": "/root/autoware_data/yabloc_pose_initializer/resources.tar.gz", "elapsed": 5, "msg": "Request failed: <urlopen error [Errno 101] Network is unreachable>", "url": "https://s3.ap-northeast-2.wasabisys.com/pinto-model-zoo/136_road-segmentation-adas-0001/resources.tar.gz"}
198.1 
198.1 PLAY RECAP *********************************************************************
198.1 localhost                  : ok=45   changed=19   unreachable=0    failed=1    skipped=29   rescued=0    ignored=0   
198.1 
198.2 Failed.
------

As you can see the error is still a failing connection to and S3 store. I digged it too. Here's the outcome

wasabisys.com :white_check_mark:

; <<>> DiG 9.18.18-0ubuntu0.22.04.2-Ubuntu <<>> wasabisys.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 61557
;; flags: qr rd ra; QUERY: 1, ANSWER: 28, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;wasabisys.com.                 IN      A

;; ANSWER SECTION:
wasabisys.com.          120     IN      A       38.27.106.124
wasabisys.com.          120     IN      A       38.27.106.16
wasabisys.com.          120     IN      A       38.27.106.27
wasabisys.com.          120     IN      A       38.27.106.29
wasabisys.com.          120     IN      A       38.27.106.32
wasabisys.com.          120     IN      A       38.27.106.100
wasabisys.com.          120     IN      A       38.27.106.126
wasabisys.com.          120     IN      A       38.27.106.102
wasabisys.com.          120     IN      A       38.27.106.15
wasabisys.com.          120     IN      A       38.27.106.24
wasabisys.com.          120     IN      A       38.27.106.23
wasabisys.com.          120     IN      A       38.27.106.19
wasabisys.com.          120     IN      A       38.27.106.106
wasabisys.com.          120     IN      A       38.27.106.101
wasabisys.com.          120     IN      A       38.27.106.33
wasabisys.com.          120     IN      A       38.27.106.125
wasabisys.com.          120     IN      A       38.27.106.21
wasabisys.com.          120     IN      A       38.27.106.31
wasabisys.com.          120     IN      A       38.27.106.107
wasabisys.com.          120     IN      A       38.27.106.14
wasabisys.com.          120     IN      A       38.27.106.26
wasabisys.com.          120     IN      A       38.27.106.12
wasabisys.com.          120     IN      A       38.27.106.25
wasabisys.com.          120     IN      A       38.27.106.13
wasabisys.com.          120     IN      A       38.27.106.103
wasabisys.com.          120     IN      A       38.27.106.22
wasabisys.com.          120     IN      A       38.27.106.30
wasabisys.com.          120     IN      A       38.27.106.123

;; Query time: 292 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Fri May 10 09:16:49 CEST 2024
;; MSG SIZE  rcvd: 490

ap-northeast-2.wasabisys.com :white_check_mark:

; <<>> DiG 9.18.18-0ubuntu0.22.04.2-Ubuntu <<>> ap-northeast-2.wasabisys.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 11056
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;ap-northeast-2.wasabisys.com.  IN      A

;; ANSWER SECTION:
ap-northeast-2.wasabisys.com. 120 IN    A       219.164.248.231
ap-northeast-2.wasabisys.com. 120 IN    A       219.164.248.230
ap-northeast-2.wasabisys.com. 120 IN    A       219.164.248.232
ap-northeast-2.wasabisys.com. 120 IN    A       219.164.248.233

;; Query time: 164 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Fri May 10 09:18:07 CEST 2024
;; MSG SIZE  rcvd: 121

s3.ap-northeast-2.wasabisys.com :x:

; <<>> DiG 9.18.18-0ubuntu0.22.04.2-Ubuntu <<>> s3.ap-northeast-2.wasabisys.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 34605
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;s3.ap-northeast-2.wasabisys.com. IN    A

;; ANSWER SECTION:
s3.ap-northeast-2.wasabisys.com. 10 IN  CNAME   malware.demo.spsredir.dnsfilters.com.
malware.demo.spsredir.dnsfilters.com. 493 IN A  23.200.237.238

;; Query time: 16 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Fri May 10 09:19:29 CEST 2024
;; MSG SIZE  rcvd: 123

Consequently, the issue persists w/ curl or wget

$> wget -v  https://s3.ap-northeast-2.wasabisys.com/pinto-model-zoo/136_road-segmentation-adas-0001/resources.tar.gz                                                                                                  --2024-05-10 09:58:50--  https://s3.ap-northeast-2.wasabisys.com/pinto-model-zoo/136_road-segmentation-adas-0001/resources.tar.gz
Resolving s3.ap-northeast-2.wasabisys.com (s3.ap-northeast-2.wasabisys.com)... 23.200.237.238
Connecting to s3.ap-northeast-2.wasabisys.com (s3.ap-northeast-2.wasabisys.com)|23.200.237.238|:443... failed: Network is unreachable.
$> wget -v  https://s3.ap-northeast-2.wasabisys.com/pinto-model-zoo/136_road-segmentation-adas-0001/resources.tar.gz                                                                                                  
$> curl -v   https://s3.ap-northeast-2.wasabisys.com/pinto-model-zoo/136_road-segmentation-adas-0001/resources.tar.gz                                                                                           
*   Trying 23.200.237.238:443...
* connect to 23.200.237.238 port 443 failed: Network is unreachable
* Failed to connect to s3.ap-northeast-2.wasabisys.com port 443 after 1210 ms: Network is unreachable
* Closing connection 0
curl: (7) Failed to connect to s3.ap-northeast-2.wasabisys.com port 443 after 1210 ms: Network is unreachable

The network I am using is located in Thuringia/Germany. I was able to connect to the host from a different network (in Berlin/Germany). I'll have to investigate if there's sth in the local network that prevents it once the NW manager is back in (next monday) and report.

stefanAMB commented 1 month ago

Ok, I investigated a bit more. See the following tracereoutes

 traceroute to s3.ap-northeast-2.wasabisys.com (23.200.237.238), 30 hops max, 60 byte packets
 1  fritz.box (192.168.178.1)  0.498 ms  0.548 ms
 2  ber1001fihr001.versatel.de (62.214.63.105)  3.157 ms  2.819 ms
 3  vlan213.100M.flensburg1.distribution.komtel.net (62.214.0.77)  2.429 ms  2.328 ms
 4  versatel-ic-326760.ip.twelve99-cust.net (213.155.129.191)  2.381 ms  2.492 ms
 5  * 212.162.40.37 (212.162.40.37)  2.992 ms
 6  be3341.rcr71.ber01.atlas.cogentco.com (154.54.60.1)  14.528 ms  14.385 ms
 7  hbg-bb3-link.ip.twelve99.net (62.115.137.40)  9.518 ms be3141.ccr41.ham01.atlas.cogentco.com (130.117.49.137)  14.635 ms
 8  4.53.31.70 (4.53.31.70)  155.526 ms !N *

versus

traceroute to ap-northeast-2.wasabisys.com (219.164.248.230), 30 hops max, 60 byte packets
 1  fritz.box (192.168.178.1)  0.386 ms  0.491 ms
 2  * *
 3  vlan213.100M.flensburg1.distribution.komtel.net (62.214.0.77)  1.847 ms  1.976 ms
 4  versatel-ic-326760.ip.twelve99-cust.net (213.155.129.191)  2.076 ms 149.11.163.26 (149.11.163.26)  2.638 ms
 5  * 80.156.161.25 (80.156.161.25)  3.193 ms
 6  ae2.11.edge1.mln1.neo.colt.net (171.75.9.108)  21.549 ms f-ed12-i.F.DE.NET.DTAG.DE (217.5.67.162)  78.135 ms
 7  62.157.249.186 (62.157.249.186)  14.937 ms be3141.ccr41.ham01.atlas.cogentco.com (130.117.49.137)  14.085 ms
 8  be2816.ccr42.ams03.atlas.cogentco.com (154.54.38.209)  13.983 ms  14.010 ms
 9  ae-14.r21.londen12.uk.bb.gin.ntt.net (129.250.3.12)  21.920 ms ae-3.r20.frnkge13.de.bb.gin.ntt.net (129.250.3.22)  16.958 ms
10  ae-13.r24.asbnva02.us.bb.gin.ntt.net (129.250.6.6)  107.803 ms *
11  * be2806.ccr41.dca01.atlas.cogentco.com (154.54.40.106)  95.401 ms
12  * be3084.ccr41.iad02.atlas.cogentco.com (154.54.30.66)  101.598 ms
13  ae-1.a02.osakjp02.jp.bb.gin.ntt.net (129.250.4.232)  267.338 ms ae-21.a08.asbnva02.us.bb.gin.ntt.net (129.250.8.121)  110.058 ms
14  ae-3.r22.chcgil09.us.bb.gin.ntt.net (129.250.2.166)  118.235 ms ae-7.r26.dllstx14.us.bb.gin.ntt.net (129.250.4.152)  133.497 ms
15  ae-7.r26.dllstx14.us.bb.gin.ntt.net (129.250.4.152)  141.293 ms ae-4.r32.tokyjp05.jp.bb.gin.ntt.net (129.250.5.55)  255.862 ms
16  211.6.15.190 (211.6.15.190)  267.001 ms ae-2.r24.lsanca07.us.bb.gin.ntt.net (129.250.7.69)  154.067 ms
17  219.164.248.230 (219.164.248.230)  263.239 ms ae-1.a02.osakjp02.jp.bb.gin.ntt.net (129.250.4.232)  260.841 ms

Fiddling a bit I found that the s3 subdomain isn't even needed. Therefore the following diff fixes the issues I face:

--- a/ansible/roles/artifacts/tasks/main.yaml
+++ b/ansible/roles/artifacts/tasks/main.yaml
@@ -8,7 +8,7 @@
 - name: Download yabloc_pose_initializer/resources.tar.gz
   become: true
   ansible.builtin.get_url:
-    url: https://s3.ap-northeast-2.wasabisys.com/pinto-model-zoo/136_road-segmentation-adas-0001/resources.tar.gz
+    url: https://ap-northeast-2.wasabisys.com/pinto-model-zoo/136_road-segmentation-adas-0001/resources.tar.gz
     dest: "{{ data_dir }}/yabloc_pose_initializer/resources.tar.gz"
     mode: "644"
     checksum: sha256:1f660e15f95074bade32b1f80dbf618e9cee1f0b9f76d3f4671cb9be7f56eb3a

I just checked an all images are built as expected. Not sure if this is then worthy a PR as it might be related to some network issues that aren't really the concern here.