Open Smithx10 opened 5 years ago
I just tested centos:latest and yum update -y worked fine. I believe this is just something to do with the new ubuntu images.
Alpine seems to work also:
[Mon 19/01/07 17:48 EST][pts/5][x86_64/linux-gnu/4.19.2-arch1-1-ARCH][5.6.2]
<smith@arch-nix:~>
zsh/2 2765 [130] % td run -d -it --name=alpinelatest -p 8081:8081 -m 1gb alpine:latest /bin/sh
Unable to find image 'alpine:latest' locally
latest: Pulling from alpine (req 119d4462-30a0-4cfc-8d0e-836f49d9b5cd)
cd784148e348: Pull complete
Digest: sha256:3d2e482b82608d153a374df3357c0291589a61cc194ec4a9ca2381073a17f58e
Status: Downloaded newer image for alpine:latest
64592b20829ec8879047e06f72c6da6fb5e2b460810042359088eea51e4a8e19
[Mon 19/01/07 17:49 EST][pts/5][x86_64/linux-gnu/4.19.2-arch1-1-ARCH][5.6.2]
<smith@arch-nix:~>
zsh/2 2766 % td ps | grep alpine
64592b20829e alpine:latest "/bin/sh" 28 seconds ago Up 19 seconds 0.0.0.0:8081->8081/tcp alpinelatest
[Mon 19/01/07 17:49 EST][pts/5][x86_64/linux-gnu/4.19.2-arch1-1-ARCH][5.6.2]
<smith@arch-nix:~>
zsh/2 2767 % td exec alpinelatest apk update
fetch http://dl-cdn.alpinelinux.org/alpine/v3.8/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.8/community/x86_64/APKINDEX.tar.gz
v3.8.2-13-g106f36ecbb [http://dl-cdn.alpinelinux.org/alpine/v3.8/main]
v3.8.2-8-g684f341f68 [http://dl-cdn.alpinelinux.org/alpine/v3.8/community]
OK: 9545 distinct packages available
Could you show "ip addr" output? Could you ping any host via IP address only (exclude DNS issue)?
@ad-m :( Sadly there are no net tools in the image, or ping :( . I think I can move over a binary from a different instance running 16:04 and see....
zsh 2706 % ls
[Mon 19/01/07 19:30 EST][pts/6][x86_64/linux-gnu/4.19.2-arch1-1-ARCH][5.6.2]
<smith@arch-nix:~>
zsh/2 2724 [130] % td exec fervent_lamarr which ping
/bin/ping
[Mon 19/01/07 19:30 EST][pts/6][x86_64/linux-gnu/4.19.2-arch1-1-ARCH][5.6.2]
<smith@arch-nix:~>
zsh/2 2725 % td exec fervent_lamarr ping 8.8.8.8
[Mon 19/01/07 19:30 EST][pts/6][x86_64/linux-gnu/4.19.2-arch1-1-ARCH][5.6.2]
<smith@arch-nix:~>
zsh/2 2726 [127] %
Looks like nothing :(
@jasonbking from IRC stated the following:
<jbk> This space for rent could be sendmmsg
6:56 PM IIRC some newer glibcs are using it for dns
6:56 PM (and isn't supported w/ lx yet)
@Smithx10 you should be able to use the native networking tools in /native/*/bin
.
A while back one of the cloud-init devs was working on some changes to cloud-init that were specific to lx. It could be something with that. I've not looked at how we normally plumb up networking for lx/docker, so it is quite possible that cloud-init is always out of the picture for lx networking.
@mgerdts
Yes, Looks like that is working. But the default behaviour of apt-get in the container isn't working.
I'lll step through the newer versions from 16.04 and see when we hit the issue.
[Mon 19/01/07 21:14 EST][pts/0][x86_64/linux-gnu/4.19.2-arch1-1-ARCH][5.6.2]
<smith@arch-nix:~>
zsh 2710 [25] % td exec -it fervent_lamarr /native/usr/sbin/ping google.com
google.com is alive
It seem's like this behaviour arrived in the docker image ubuntu:17.10
All the versions up until this ran apt-get update just fine.
[Mon 19/01/07 21:23 EST][pts/0][x86_64/linux-gnu/4.19.2-arch1-1-ARCH][5.6.2]
<smith@arch-nix:~>
zsh 2718 % td exec -it ubuntu1710 apt-get update -y
Err:1 http://security.ubuntu.com/ubuntu artful-security InRelease
Temporary failure resolving 'security.ubuntu.com'
Err:2 http://archive.ubuntu.com/ubuntu artful InRelease
Temporary failure resolving 'archive.ubuntu.com'
Err:3 http://archive.ubuntu.com/ubuntu artful-updates InRelease
Temporary failure resolving 'archive.ubuntu.com'
Err:4 http://archive.ubuntu.com/ubuntu artful-backports InRelease
Temporary failure resolving 'archive.ubuntu.com'
Reading package lists... Done
W: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/artful/InRelease Temporary failure resolving 'archive.ubuntu.com'
W: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/artful-updates/InRelease Temporary failure resolving 'archive.ubuntu.com'
W: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/artful-backports/InRelease Temporary failure resolving 'archive.ubuntu.com'
W: Failed to fetch http://security.ubuntu.com/ubuntu/dists/artful-security/InRelease Temporary failure resolving 'security.ubuntu.com'
W: Some index files failed to download. They have been ignored, or old ones used instead.
So looking at this https://wiki.ubuntu.com/ArtfulAardvark/ReleaseNotes , it looks like there are a few things that stand out.
Most likely being
Network configuration
ifupdown has been deprecated in favor of netplan and is no longer present on new installs. The installer will generate a configuration file for netplan in /etc/netplan, which will set up the system to configure the network via systemd-networkd or NetworkManager. Desktop users will see their system fully managed via NetworkManager as it has been the case in previous releases, but Server users now have their network devices managed via systemd-networkd on new installs. This only applies to new installations.
Given that ifupdown is no longer installed by default, its commands will not be present: ifup and ifdown are thus unavailable, replaced by ip link set $device up and ip link set $device down.
The networkctl command is also available for users to see a summary of the network devices. networkctl status will display the current global state of IP addresses on the system; and networkctl status $device can display the details specific to a network device.
For more information about netplan, please refer to the manual page using the man 5 netplan command.
@mgerdts I don't believe sdc-docker is using cloud-init... so the following probably doesn't apply... but I will note it here for easier reference.
cloud-init
The version was updated to 17.1. Notable new features include:
Python 3.6 support
Ec2 support for IPv6 instance configuration
Expedited boot time through cloud-id optimization
Support for netplan yaml in cloud-init
Add cloud-init subcommands collect-logs, analyze and schema for developers
Apport integration from cloud-init via ‘ubuntu-bug cloud-init’
Significant unittest and integration test coverage improvements
While checking the docker-init process... the interface is definitely being plumbed correctly on the illumos side.... I don't see any issues.... and the fact that /native/ tools can route packets means this is most likely a ubuntu userspace issue.... probably DNS, if I had to guess.
Log from 17.10
[root@00-0c-29-1e-ac-7c (us-east-1) /zones/851d0dc2-7fdd-ed8a-d079-fc6f33b47b63/root/var/log]# cat sdc-dockerinit.log
2019-01-08T02:23:20.226Z MDATA sdc:brand=lx
2019-01-08T02:23:20.226Z MOUNT /dev/shm (shm)
2019-01-08T02:23:20.227Z REPLACE /etc/mtab
2019-01-08T02:23:20.227Z INFO setting up networking
2019-01-08T02:23:20.227Z INFO started ipmgmtd[75784]
2019-01-08T02:23:20.234Z INFO ipmgmtd[75784] exited: 0
2019-01-08T02:23:20.235Z PLUMB lo0
2019-01-08T02:23:20.235Z RAISE[lo0] addr=127.0.0.1, netmask=255.0.0.0
2019-01-08T02:23:20.239Z MDATA sdc:nics=[{"interface":"eth0","mac":"90:b8:d0:a9:9b:66","vlan_id":2,"nic_tag":"sdc_overlay/9501526","gateway":"192.168.128.1","gateways":["192.168.128.1"],"netmask":"255.255.252.0","ip":"192.168.128.143","ips":["192.168.128.143/22"],"network_uuid":"4b609af0-4310-4177-975e-e27f353992e2","mtu":8500},{"interface":"eth1","mac":"90:b8:d0:0f:58:89","vlan_id":10,"nic_tag":"external","gateway":"10.1.10.1","gateways":["10.1.10.1"],"netmask":"255.255.255.0","ip":"10.1.10.100","ips":["10.1.10.100/24"],"network_uuid":"50c48e19-a55b-4af8-9f06-c430f96c37ed","mtu":1500,"primary":true}]
2019-01-08T02:23:20.240Z PLUMB eth0
2019-01-08T02:23:20.242Z RAISE[eth0] addr=192.168.128.143, netmask=255.255.252.0
2019-01-08T02:23:20.751Z PLUMB eth1
2019-01-08T02:23:20.753Z RAISE[eth1] addr=10.1.10.100, netmask=255.255.255.0
2019-01-08T02:23:21.260Z ROUTE[eth1] gw=10.1.10.1, dst=0.0.0.0
2019-01-08T02:23:21.260Z MDATA sdc:routes=[]
2019-01-08T02:23:21.261Z MDATA docker:noipmgmtd=true
2019-01-08T02:23:21.261Z INFO ipmgmtd PID is 75786
2019-01-08T02:23:21.261Z KILLED ipmgmtd[75786]
2019-01-08T02:23:21.277Z INFO network setup complete
2019-01-08T02:23:21.278Z INFO no metadata for 'docker:nfsvolumes'
2019-01-08T02:23:21.278Z No docker:nfsvolumes, nothing to mount
2019-01-08T02:23:21.278Z MDATA sdc:hostname=851d0dc27fdd
2019-01-08T02:23:21.278Z INFO setting hostname = '851d0dc27fdd'
2019-01-08T02:23:21.278Z INFO no metadata for 'docker:user'
2019-01-08T02:23:21.279Z INFO passwd.pw_name: root
2019-01-08T02:23:21.279Z INFO passwd.pw_uid: 0
2019-01-08T02:23:21.279Z INFO passwd.pw_gid: 0
2019-01-08T02:23:21.279Z INFO passwd.pw_dir: /root
2019-01-08T02:23:21.279Z INFO group.gr_name: root
2019-01-08T02:23:21.279Z INFO group.gr_gid: 0
2019-01-08T02:23:21.279Z INFO no metadata for 'docker:workdir'
2019-01-08T02:23:21.279Z WORKDIR '/'
2019-01-08T02:23:21.279Z MDATA docker:linkEnv=[]
2019-01-08T02:23:21.280Z MDATA docker:env=["PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"]
2019-01-08T02:23:21.280Z ENV[0] TERM=xterm
2019-01-08T02:23:21.280Z ENV[1] HOME=/root
2019-01-08T02:23:21.280Z ENV[2] PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
2019-01-08T02:23:21.280Z ENV[3] HOSTNAME=851d0dc27fdd
2019-01-08T02:23:21.280Z MDATA docker:entrypoint=[]
2019-01-08T02:23:21.281Z MDATA docker:cmd=["/bin/sh"]
2019-01-08T02:23:21.281Z ARGV[0]:CMD "/bin/sh"
2019-01-08T02:23:21.281Z MDATA docker:tty=true
2019-01-08T02:23:21.281Z INFO zfd_ready() took 0 loops
2019-01-08T02:23:21.281Z MDATA docker:open_stdin=true
2019-01-08T02:23:21.281Z SWITCHING TO /dev/zfd/*
2019-01-08T02:23:21.282Z INFO open(/dev/zfd/0) SUCCESS on attempt 0
2019-01-08T02:23:21.282Z INFO open(/dev/zfd/0) SUCCESS on attempt 0
2019-01-08T02:23:21.282Z INFO open(/dev/zfd/0) SUCCESS on attempt 0
2019-01-08T02:23:21.282Z MDATA docker:logdriver=json-file
2019-01-08T02:23:21.282Z INFO logdriver json-file
2019-01-08T02:23:21.282Z INFO no metadata for 'docker:wait_for_attach'
2019-01-08T02:23:21.282Z EXECNAME "/bin/sh"
2019-01-08T02:23:21.282Z DROP PRIVS
Look's like this may effect more than just Triton...
For the record.... debian:latest is working just fine. So I believe this is only an ubuntu thing.
[Mon 19/01/07 22:04 EST][pts/0][x86_64/linux-gnu/4.19.2-arch1-1-ARCH][5.6.2] smith@arch-nix:/git/scratch/illumos-joyent/usr/src/lib/brand/lx/zone zsh 2735 (git)-[master]-% td exec -it deblatest apt-get update -y Ign:1 http://cdn-fastly.deb.debian.org/debian stretch InRelease Get:2 http://cdn-fastly.deb.debian.org/debian stretch-updates InRelease [91.0 kB] Get:3 http://security-cdn.debian.org/debian-security stretch/updates InRelease [94.3 kB] Get:4 http://cdn-fastly.deb.debian.org/debian stretch Release [118 kB] Get:5 http://cdn-fastly.deb.debian.org/debian stretch Release.gpg [2434 B] Get:6 http://cdn-fastly.deb.debian.org/debian stretch-updates/main amd64 Packages [5152 B] Get:7 http://security-cdn.debian.org/debian-security stretch/updates/main amd64 Packages [464 kB] Get:8 http://cdn-fastly.deb.debian.org/debian stretch/main amd64 Packages [7089 kB] Fetched 7864 kB in 2s (3590 kB/s)
Don't know if this is related or unrelated but andyf in irc mentioned the following issue in omnios.
Just to state the obvious (for the search engine); this issue currently applies to 18.04 as well.
It seems that DNS is broken inside the zone for the ubuntu tools.
For a workaround, I hard coded the apt ip addresses:
echo 91.189.88.149 security.ubuntu.com archive.ubuntu.com >> /etc/hosts
which allowed apt update
to work, but I was unable to perform apt upgrade
:
# apt upgrade
...
Processing triggers for libc-bin (2.27-3ubuntu1) ...
Segmentation fault (core dumped)
Segmentation fault (core dumped)
dpkg: error processing package libc-bin (--configure):
installed libc-bin package post-installation script subprocess returned error exit status 139
Errors were encountered while processing:
libc-bin
E: Sub-process /usr/bin/dpkg returned an error code (1)
Native DNS is working correctly inside the zone:
# /native/usr/sbin/ping www.google.com
www.google.com is alive
Even if I run ldconfig, it crashes:
# ldconfig
Segmentation fault (core dumped)
So it seems some internal library (or libraries) like libc
are not working correctly - I would hazard a guess that it's due to a difference in the LX implementation for certain system call(s).
Tim Classic from IRC ran into this issue in his NixOS and suggested to try the following.
It's been a while since I figured this out, but IIRC the problem was IPv6-related, and I think options single-request
is a workaround that papers over the underlying issue by way of causing two sequential requests instead of two in parallel.
These changes (committed to OmniOS) may well fix this issue: https://github.com/omniosorg/illumos-omnios/pull/443
https://github.com/omniosorg/illumos-omnios/pull/443 looks promising!
Interestingly, I recently dug through my own deployment code and found that I left myself the following comment in a resolv.conf
destined for a SmartOS Docker container:
# The single-request option works around the lack of sendmmsg() syscall
# support in SmartOS's lx-brand ABI emulation--otherwise, getaddrinfo()
# would try to use it.
options single-request timeout:2 attempts:2 ndots:2
My apologies for not finding this the last time I commented here.
I've opened OS-7754 to track this in Jira.
# apt upgrade ... Processing triggers for libc-bin (2.27-3ubuntu1) ... Segmentation fault (core dumped) Segmentation fault (core dumped) dpkg: error processing package libc-bin (--configure): installed libc-bin package post-installation script subprocess returned error exit status 139 Errors were encountered while processing: libc-bin E: Sub-process /usr/bin/dpkg returned an error code (1)
Hi guys,
Is there a fix to this problem?
While attempting to use the ubuntu:latest docker image the following doesnt work. I believe something in the networking does not work. 16:04 does work.