canonical / colcon-in-container

Colcon extension to build a colcon workspace in a container
GNU General Public License v3.0
26 stars 1 forks source link

Failed to run cloud-init with error: 1. #22

Closed aosmw closed 3 months ago

aosmw commented 3 months ago
mkdir tryit
cd tryit
python3 -mvenv venv
source venv/bin/activate
pip3 install --upgrade pip
pip3 install git+https://github.com/canonical/colcon-in-container
mkdir src
cd src
git clone myrosrepo
cd ..
colcon --log-level=info release-in-container --ros-distro humble --bloom-generator rosdebian --debug 
INFO:colcon.colcon_core.location:Using log path 'log/release-in-container_2024-06-10_14-41-44'
[0.123s] INFO:colcon.colcon-in-container:Downloading the image then creating the LXD instance
[1.571s] INFO:colcon.colcon-in-container:Waiting for ROS 2 to be installed
[470.152s] ERROR:colcon.colcon-in-container:Failed to run cloud-init with error: 1.
[470.152s] WARNING:colcon.colcon-in-container:Debug was selected, entering the instance
root@colcon-in-container:/ws#
root@colcon-in-container:/ws# cloud-init status
status: error
root@colcon-in-container:/ws# cloud-init collect-logs
version:  /usr/bin/cloud-init 24.1.3-0ubuntu1~22.04.4

Wrote /ws/cloud-init.tar.gz
root@colcon-in-container:/ws# tar -zxvf cloud-init.tar.gz 
tar -zxvf cloud-init.tar.gz
cd cloud-init
root@colcon-in-container:/ws/cloud-init-logs-2024-06-10# cat journal.txt  | grep -C3 ERROR
Jun 10 04:42:28.906350 colcon-in-container snapd[175]: overlord.go:515: Released state lock file
Jun 10 04:42:28.906350 colcon-in-container snapd[175]: daemon stop requested to wait for socket activation
Jun 10 04:42:28.907474 colcon-in-container systemd[1]: snapd.service: Deactivated successfully.
Jun 10 04:44:52.956665 colcon-in-container cloud-init[276]: 2024-06-10 04:44:52,955 - gpg.py[ERROR]: Failed to obtain gpg key C1CF 6E31 E6BA DE88 68B1 72B4 F42E D6FB AB17 C654
Jun 10 04:44:52.956665 colcon-in-container cloud-init[276]: Traceback (most recent call last):
Jun 10 04:44:52.956665 colcon-in-container cloud-init[276]:   File "/usr/lib/python3/dist-packages/cloudinit/gpg.py", line 101, in recv_key
Jun 10 04:44:52.956665 colcon-in-container cloud-init[276]:     naplen = next(sleeps)
--
Jun 10 04:46:46.483889 colcon-in-container systemd[1]: Finished Download data for packages that failed at package install time.
Jun 10 04:46:53.886977 colcon-in-container snapd[2046]: api_snaps.go:427: Installing snap "git" revision unset
Jun 10 04:47:33.947019 colcon-in-container snapd[2046]: api_snaps.go:427: Installing snap "python3-pip" revision unset
Jun 10 04:47:54.005176 colcon-in-container python3[1716]: ["2024-06-10T04:47:54.001", "ERROR", "ubuntupro.http", "_readurl_urllib", 171, "timed out", {"exc_info": "Traceback (most recent call last):\n  File \"/usr/lib/python3.10/urllib/request.py\", line 1348, in do_open\n    h.request(req.get_method(), req.selector, req.data, headers,\n  File \"/usr/lib/python3.10/http/client.py\", line 1283, in request\n    self._send_request(method, url, body, headers, encode_chunked)\n  File \"/usr/lib/python3.10/http/client.py\", line 1329, in _send_request\n    self.endheaders(body, encode_chunked=encode_chunked)\n  File \"/usr/lib/python3.10/http/client.py\", line 1278, in endheaders\n    self._send_output(message_body, encode_chunked=encode_chunked)\n  File \"/usr/lib/python3.10/http/client.py\", line 1038, in _send_output\n    self.send(msg)\n  File \"/usr/lib/python3.10/http/client.py\", line 976, in send\n    self.connect()\n  File \"/usr/lib/python3.10/http/client.py\", line 1448, in connect\n    super().connect()\n  File \"/usr/lib/python3.10/http/client.py\", line 942, in connect\n    self.sock = self._create_connection(\n  File \"/usr/lib/python3.10/socket.py\", line 845, in create_connection\n    raise err\n  File \"/usr/lib/python3.10/socket.py\", line 833, in create_connection\n    sock.connect(sa)\nTimeoutError: timed out\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/usr/lib/python3/dist-packages/uaclient/http/__init__.py\", line 167, in _readurl_urllib\n    resp = request.urlopen(req, timeout=timeout)\n  File \"/usr/lib/python3.10/urllib/request.py\", line 216, in urlopen\n    return opener.open(url, data, timeout)\n  File \"/usr/lib/python3.10/urllib/request.py\", line 519, in open\n    response = self._open(req, data)\n  File \"/usr/lib/python3.10/urllib/request.py\", line 536, in _open\n    result = self._call_chain(self.handle_open, protocol, protocol +\n  File \"/usr/lib/python3.10/urllib/request.py\", line 496, in _call_chain\n    result = func(*args)\n  File \"/usr/lib/python3.10/urllib/request.py\", line 1391, in https_open\n    return self.do_open(http.client.HTTPSConnection, req,\n  File \"/usr/lib/python3.10/urllib/request.py\", line 1351, in do_open\n    raise URLError(err)\nurllib.error.URLError: <urlopen error timed out>"}]
Jun 10 04:47:54.006043 colcon-in-container python3[1716]: ["2024-06-10T04:47:54.005", "ERROR", "ubuntupro.lib.esm_cache", "main", 17, "Error updating the cache: Failed to connect to https://contracts.canonical.com/v1/resources?architecture=amd64&kernel=6.5.0-1023-oem&series=jammy&virt=lxc\ntimed out\n", {}]
Jun 10 04:47:54.032409 colcon-in-container systemd[1]: esm-cache.service: Deactivated successfully.
Jun 10 04:47:54.032713 colcon-in-container systemd[1]: Finished Update the local ESM caches.
Jun 10 04:48:13.990177 colcon-in-container snapd[2046]: api_snaps.go:427: Installing snap "build-essential" revision unset
aosmw commented 3 months ago

exit container and return to host and check version of pylxd

lsb_release -a && python3 -c "import pylxd;print('pylxd:', pylxd.__version__)"
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.4 LTS
Release:        22.04
Codename:       jammy
pylxd: 2.3.4

Attempt Upgrade pylxd in virtual environment to see if that helps.

cd tryit
source venv/bin/activate

pip install --upgrade git+https://github.com/lxc/pylxd 
Collecting git+https://github.com/lxc/pylxd
  Cloning https://github.com/lxc/pylxd to /tmp/pip-req-build-bzm9buzr
  Running command git clone --filter=blob:none --quiet https://github.com/lxc/pylxd /tmp/pip-req-build-bzm9buzr
  Resolved https://github.com/lxc/pylxd to commit 3fdca7e3d8424b5feaea6c3722d0f2c6d775a500
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: cryptography>=3.2 in ./venv/lib/python3.10/site-packages (from pylxd==2.3.4) (3.4.8)
Requirement already satisfied: python-dateutil>=2.4.2 in ./venv/lib/python3.10/site-packages (from pylxd==2.3.4) (2.9.0.post0)
Requirement already satisfied: requests<2.32.0,>=2.20.0 in ./venv/lib/python3.10/site-packages (from pylxd==2.3.4) (2.31.0)
Requirement already satisfied: requests-toolbelt>=0.8.0 in ./venv/lib/python3.10/site-packages (from pylxd==2.3.4) (1.0.0)
Requirement already satisfied: requests-unixsocket>=0.1.5 in ./venv/lib/python3.10/site-packages (from pylxd==2.3.4) (0.3.0)
Requirement already satisfied: urllib3<2 in ./venv/lib/python3.10/site-packages (from pylxd==2.3.4) (1.26.18)
Requirement already satisfied: ws4py!=0.3.5,>=0.3.4 in ./venv/lib/python3.10/site-packages (from pylxd==2.3.4) (0.5.1)
Requirement already satisfied: cffi>=1.12 in ./venv/lib/python3.10/site-packages (from cryptography>=3.2->pylxd==2.3.4) (1.16.0)
Requirement already satisfied: six>=1.5 in ./venv/lib/python3.10/site-packages (from python-dateutil>=2.4.2->pylxd==2.3.4) (1.16.0)
Requirement already satisfied: charset-normalizer<4,>=2 in ./venv/lib/python3.10/site-packages (from requests<2.32.0,>=2.20.0->pylxd==2.3.4) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in ./venv/lib/python3.10/site-packages (from requests<2.32.0,>=2.20.0->pylxd==2.3.4) (3.7)
Requirement already satisfied: certifi>=2017.4.17 in ./venv/lib/python3.10/site-packages (from requests<2.32.0,>=2.20.0->pylxd==2.3.4) (2024.6.2)
Requirement already satisfied: pycparser in ./venv/lib/python3.10/site-packages (from cffi>=1.12->cryptography>=3.2->pylxd==2.3.4) (2.22)

python3 -c "import pylxd;print('pylxd:', pylxd.__version__)"
pylxd: 2.3.4
aosmw commented 3 months ago

Ensuring cloud-init is up to date.

sudo apt update
sudo apt install --only-upgrade cloud-init
cloud-init --version
/usr/bin/cloud-init 24.1.3-0ubuntu1~22.04.4
aosmw commented 3 months ago

Inside the lxd - network not available

ping 8.8.8.8
nc keyserver.ubuntu.com 80

Docker is installed on host and is interfering with lxd container egress

sudo iptables -nL
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy DROP)
target     prot opt source               destination         
DOCKER-USER  all  --  0.0.0.0/0            0.0.0.0/0           
DOCKER-ISOLATION-STAGE-1  all  --  0.0.0.0/0            0.0.0.0/0           
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
DOCKER     all  --  0.0.0.0/0            0.0.0.0/0           
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         

Chain DOCKER (1 references)
target     prot opt source               destination         

Chain DOCKER-ISOLATION-STAGE-1 (1 references)
target     prot opt source               destination         
DOCKER-ISOLATION-STAGE-2  all  --  0.0.0.0/0            0.0.0.0/0           
RETURN     all  --  0.0.0.0/0            0.0.0.0/0           

Chain DOCKER-ISOLATION-STAGE-2 (1 references)
target     prot opt source               destination         
DROP       all  --  0.0.0.0/0            0.0.0.0/0           
RETURN     all  --  0.0.0.0/0            0.0.0.0/0           

Chain DOCKER-USER (1 references)
target     prot opt source               destination         
RETURN     all  --  0.0.0.0/0            0.0.0.0/0

Allow lxdbr0 access to internet - https://documentation.ubuntu.com/lxd/en/latest/howto/network_bridge_firewalld/

sudo iptables -I DOCKER-USER -i lxdbr0 -j ACCEPT
sudo iptables -I DOCKER-USER -o lxdbr0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT

Now back in lxd container the failing command run manually now works.

root@colcon-in-container:/ws# gpg  --keyserver=keyserver.ubuntu.com --recv-keys 'C1CF 6E31 E6BA DE88 68B1 72B4 F42E D6FB AB17 C654'
gpg: /root/.gnupg/trustdb.gpg: trustdb created
gpg: key F42ED6FBAB17C654: public key "Open Robotics <info@osrfoundation.org>" imported
gpg: Total number processed: 1
gpg:               imported: 1
aosmw commented 3 months ago

So now working.

Other interesting tips access the container while its doing its work.

lxd exec colcon_in_container bash
# See how cloud-init went
cat /var/log/cloud-init.log
# See whats going on
journalctl -f
Guillaumebeuzeboc commented 3 months ago

Hello, thank you for your issue. Indeed, Docker and LXD are interfering network wise. I will mention it in the readme. Also, you can access the container when it's running with:

lxd shell colcon-in-container
artivis commented 3 months ago

Closing since #26 was merged