canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.38k stars 931 forks source link

apt update fails to connect but ping works in container #5629

Closed foveann closed 5 years ago

foveann commented 5 years ago

Required information

Issue description

I created a container Ubuntu 18.04 on host Fedora 29 and cannot execute 'apt update' as root. I am able to ping 8.8.8.8 and archive.ubuntu.com without any packet loss. I confirmed 'ip link set dev eth0 up'.

Error for 'apt update':

Err:1 http://archive.ubuntu.com/ubuntu bionic InRelease
Could not connect to archive.ubuntu.com:80 (91.189.88.161), connection timed out Could not connect to archive.ubuntu.com:80 (91.189.91.23), connection timed out Could not connect to archive.ubuntu.com:80 (91.189.88.149), connection timed out Could not connect to archive.ubuntu.com:80 (91.189.88.162), connection timed out Could not connect to archive.ubuntu.com:80 (91.189.91.26), connection timed out Could not connect to archive.ubuntu.com:80 (91.189.88.152), connection timed out .... Info for 'ip a':

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 7: eth0@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 00:16:3e:ee:d3:74 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 10.230.159.32/24 brd 10.230.159.255 scope global dynamic eth0 valid_lft 3348sec preferred_lft 3348sec inet6 fe80::216:3eff:feee:d374/64 scope link valid_lft forever preferred_lft forever

Info for 'lxc network list': +--------+----------+---------+-------------+---------+ | NAME | TYPE | MANAGED | DESCRIPTION | USED BY | +--------+----------+---------+-------------+---------+ | lxdbr0 | bridge | YES | | 1 | +--------+----------+---------+-------------+---------+ | virbr0 | bridge | NO | | 0 | +--------+----------+---------+-------------+---------+ | wlp2s0 | physical | NO | | 0 | +--------+----------+---------+-------------+---------+

Info for 'lxc profile show default':

config: {} description: Default LXD profile devices: eth0: name: eth0 nictype: bridged parent: lxdbr0 type: nic root: path: / pool: default type: disk name: default used_by:

Steps to reproduce

  1. lxc exec robust-orca -- /bin/bash -c 'apt update -y'
stgraber commented 5 years ago

Can you show ip link show on the host?

This smells like an MTU issue.

I'm assuming you also tried to download from other servers? The Ubuntu archive mirrors were rather swamped and slow yesterday.

foveann commented 5 years ago

Haven't tried other servers yet. Here is the ip link show on the host:

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: wlp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DORMANT group default qlen 1000
    link/ether 98:5f:d3:e9:d7:83 brd ff:ff:ff:ff:ff:ff
3: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
    link/ether 52:54:00:3a:3b:a1 brd ff:ff:ff:ff:ff:ff
4: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc fq_codel master virbr0 state DOWN mode DEFAULT group default qlen 1000
    link/ether 52:54:00:3a:3b:a1 brd ff:ff:ff:ff:ff:ff
5: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UNKNOWN mode DEFAULT group default qlen 100
    link/none 
6: lxdbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether fe:cc:24:ae:25:56 brd ff:ff:ff:ff:ff:ff
8: vethPCJF22@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master lxdbr0 state UP mode DEFAULT group default qlen 1000
    link/ether fe:cc:24:ae:25:56 brd ff:ff:ff:ff:ff:ff link-netnsid 0
stgraber commented 5 years ago

Ok, that looks fine.

Internally, I'm seeing this warning: Active/known issues: archive servers slowness / bandwidth peak | US servers added to archive.u.c so I'm going to just assume that this is the archive servers being completely swamped which would explain why ping would work but downloads wouldn't.

You may have some luck updating your /etc/apt/sources.list to use another mirror for the time being.

foveann commented 5 years ago

I tried a different mirror and got this error message :

Err:1 http://mirror.cs.pitt.edu/ubuntu/archive bionic InRelease Could not connect to mirror.cs.pitt.edu:80 (136.142.23.206), connection timed out

foveann commented 5 years ago

Thanks for all your help. I'll try again tomorrow.

stgraber commented 5 years ago

Hmm, now that's a bit odd, that mirror answers immediately here. So there can be two things I can think of that would explain this:

Can you show iptables -L -n -v on your host as well as the output of tracepath archive.ubuntu.com (also from the host)?

stgraber commented 5 years ago

The iptables output looks odd, it shows no packet having gone through any of the rules in the FORWARD table which doesn't make much sense given that you're sending traffic out of the container...

foveann commented 5 years ago

I am very grateful for your time on this issue. Not sure how to troubleshoot further. All I know is that the first time I set up a container I was able to run 'apt update' and even 'apt install nginx' successfully but after rebooting and restarting my host, this issue started. I tried uninstalling lxd then reinstalling lxd but nothing has changed. During this whole time, I have not made any changes to the iptables rules nor installed anything so I can't think of what it could be.

stgraber commented 5 years ago

Can you try doing lxc network set lxdbr0 bridge.mtu 1280 and then restart your container and try again?

If there's some kind of MTU issue, that should unblock it.

foveann commented 5 years ago

I ran that command and it didn't complain but when I checked the tracepath again, the MTU size reverted back to the default 1500. Output same as first tracepath.

stgraber commented 5 years ago

What's odd is that ICMP would go through but the rest is blocked...

You could temporarily try (as root on the host):

That should mostly rule out firewall problems. To undo that change, run:

foveann commented 5 years ago

I tried it but nothing has changed unfortunately.

stgraber commented 5 years ago

Ok, I suspect you're at the point now where you'll need to use tcpdump to follow the traffic coming out of the container, hitting the bridge then hitting the outside interface to see where things go wrong.

You can find the interface name of the container in lxc info, dump that one first, then dump lxdbr0 then finally dump your upstream link with wlan0 and make sure you see all packets in all of them, if they disappear somewhere, that should give you a clue as to what's going on.

foveann commented 5 years ago

First time doing the tcpdump on my side. I installed it and can't figure out which interface to run the command on...this is what I have when I run 'nmcli device status' on my host:

' DEVICE TYPE STATE CONNECTION
wlp2s0 wifi connected NETWORKNAME lxdbr0 bridge connected lxdbr0
virbr0 bridge connected virbr0
tun0 tun connected tun0
lxdbr0-mtu dummy unmanaged --
vethIVB9H4 ethernet unmanaged --
lo loopback unmanaged --
virbr0-nic tun unmanaged -- '

I am guessing the lxdbr0 and the wlp2s0 but what am I looking for?

stgraber commented 5 years ago

tcpdump -ni vethIVB9H4 port 80 should show you the http traffic, then repeat on lxdbr0 and then on wlp2s0, that's the path your http traffic should be taking out and back.

foveann commented 5 years ago

Thanks a lot! I ran all three and it is the wlp2s0 that yields nothing.

stgraber commented 5 years ago

Ok, so suspects there could be that ip_forward is disabled somehow, or firewalling (though that didn't seem to be the case earlier) or maybe there's something odd going on with the nat tables or ebtables.

May all be useful to track down what configuration is causing this problem.

foveann commented 5 years ago

There seems to be some discussion about port forwarding between host and container but in my case I don't even have 'apt update' so I can't possibly have a web server running. Someone in this thread suggests connecting a web server on port 9090 with this command: 'lxc config device add c1 proxy0 listen=tcp:127.0.0.1:80 connect=tcp:127.0.0.1:9090' How would I modify this if I don't have a web server running within the container to begin with? Or maybe I misunderstand the problem.

stgraber commented 5 years ago

The proxy device is completely unrelated to your issue.

U-236 commented 2 years ago

try using a cloud image, i have that problem with OpenSUSE Leap 15.4 host and fixed using cloud image like ubuntu/jammy/cloud, LXD installed using snap