Closed dragorosson closed 1 year ago
@dragorosson, see https://play.golang.org/p/3USaOszeMu for a basic TLS echo server.
You can then openssl s_client -tls1 -connect localhost:12345
to it.
@bradfitz Thanks. Turns out tls.ListenAndServe
doesn't exhibit the problem. I was able to reproduce it by duplicating the server setup in https://github.com/kubernetes/kubernetes/blob/3803fee9724ae02e4019c64275c959d308b3a74d/pkg/genericapiserver/genericapiserver.go
I'm now working on a fully-automated script to reproduce the problem.
I've finished the script (OpenStack heat orchestration template) to reproduce the bug. It creates a server, installs devstack, installs docker, builds golang containers with the good and bad SHA, compiles the simple TLS server on both of them, and then creates a heat stack (within devstack) that includes an HAProxy LB and a server for the TLS server to go on.
In all, it takes about an hour to complete. Using a flavor with more RAM should make it go faster, or even a bare metal server. If you do try a bare metal server, at least on the Rackspace cloud, you'll need to uncomment the PUBLIC_INTERFACE=...
line.
NOTE: This template was developed using the Rackspace public cloud. With small modifications, it should work on other OpenStack clouds by changing only the image
and flavor
params, because they are Rackspace-specific. To use non-OpenStack clouds, create a server and run the commands from the (huge) user_data
section, replacing %mysql_pass%
, %admin_pass%
, %rabbit_pass%
, and %service_pass%
with passwords of your choice.
Here are the manual steps:
Save the heat template in the next post as tlsstack.yaml
.
Create the devstack stack:
heat stack-create tls_stack -f tlsstack.yaml
heat stack-list
STACK_NAME=tls_stack
heat output-show --format raw $STACK_NAME ssh_key_private > ~/.ssh/${STACK_NAME}_id_rsa
chmod 400 ~/.ssh/${STACK_NAME}_id_rsa
ssh -i ~/.ssh/${STACK_NAME}_id_rsa root@$(heat output-show --format raw $STACK_NAME server_ip) tail -f userdata.log
EVERYTHING FINISHED
":tail -F userdata.log
source /opt/stack/devstack/openrc admin admin
heat stack-list
LB STACK COMPLETE
":eval `ssh-agent`
ssh-add ~/.ssh/id_rsa
STACK_NAME=tls3
NODE_IP=$(heat output-show $STACK_NAME node_fip_ip | sed -e 's/"//g')
until ssh -o StrictHostKeyChecking=no fedora@$NODE_IP
# Within the node
tail -F /var/log/cloud-init-output.log
exit
out of the node and copy the two TLS server binaries into the node:GOOD_SHA=f6c0241999bffe0fe52e8b7f5bbcc8f9e02edbdf
BAD_SHA=2a8c81ffaadc69add6ff85b241691adb7f9f24ff
scp -o StrictHostKeyChecking=no tlsserver_$GOOD_SHA fedora@$NODE_IP:/home/fedora
scp -o StrictHostKeyChecking=no tlsserver_$BAD_SHA fedora@$NODE_IP:/home/fedora
ssh fedora@$NODE_IP 'sudo ./tlsserver_'$BAD_SHA
# From within the devstack server
source /opt/stack/devstack/openrc admin admin
STACK_NAME=tls3
NODE_IP=$(heat output-show $STACK_NAME node_fip_ip | sed -e 's/"//g')
LB_IP=$(heat output-show $STACK_NAME lb_fip_ip | sed -e 's/"//g')
curl -k https://${NODE_IP}:443 # Should print "Hello, world!"
curl -k https://${LB_IP}:443 # Should hang
Requests to the LB FIP with TLS server with the good SHA running should work.
heat_template_version: 2013-05-23
description: |
Creates a DevStack server.
To access/see Devstack progress:
STACK_NAME=<stack_name>
heat output-show --format raw $STACK_NAME ssh_key_private > ~/.ssh/${STACK_NAME}_id_rsa
chmod 400 ~/.ssh/${STACK_NAME}_id_rsa
ssh -i ~/.ssh/${STACK_NAME}_id_rsa root@$(heat output-show --format raw $STACK_NAME server_ip) tail -f userdata.log
parameters:
flavor:
description: Rackspace Cloud Server flavor
type: string
#default: onmetal-general2-small
default: 15GB Standard Instance
constraints:
- allowed_values:
- onmetal-general2-small
- 30GB Standard Instance
- 15GB Standard Instance
description: must be a valid Rackspace OnMetal Server flavor large enough to run devstack
image:
type: string
description: Server image id to use
#default: OnMetal - Ubuntu 14.04 LTS (Trusty Tahr)
default: Ubuntu 14.04 LTS (Trusty Tahr) (PVHVM)
constraints:
- allowed_values:
- OnMetal - Ubuntu 14.04 LTS (Trusty Tahr)
- Ubuntu 14.04 LTS (Trusty Tahr) (PVHVM)
description: must be a Devstack-supported distro
devstack_url:
default: "https://git.openstack.org/openstack-dev/devstack"
description: Devstack URL to clone from
type: string
devstack_branch:
default: "stable/newton"
description: Devstack branch to clone
type: string
resources:
admin_pass:
type: OS::Heat::RandomString
mysql_pass:
type: OS::Heat::RandomString
rabbit_pass:
type: OS::Heat::RandomString
service_pass:
type: OS::Heat::RandomString
ssh_key:
type: OS::Nova::KeyPair
properties:
name: { get_param: "OS::stack_name" }
save_private_key: true
devstack_server:
type: OS::Nova::Server
properties:
flavor: { get_param: flavor }
image: { get_param: image }
name: { get_param: "OS::stack_name" }
key_name: { get_resource: ssh_key }
user_data_format: RAW
config_drive: "true"
user_data:
str_replace:
template: |
#!/bin/bash -x
# userdata script debug log
exec 1>/root/userdata.log 2>/root/userdata.log
# Wait until DNS resolution works
pings=0
while [[ $pings -lt 300 ]]; do
if ping -c1 mirror.rackspace.com; then
break
fi
sleep 1
((pings++))
done
# Fix networking issue with security.ubuntu.com
# http://askubuntu.com/a/787491
echo 'precedence ::ffff:0:0/96 100' >> /etc/gai.conf
# Install requirements
apt-get update
apt-get install -y git emacs nmap
pip install pdbpp
# Configure and install Devstack
groupadd stack
useradd -g stack -s /bin/bash -d /opt/stack -m stack
echo "stack ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers
# Create the devstack install script
cat >~stack/install-devstack.sh<<EOF
#!/bin/bash -xe
cd ~stack
git clone %devstack_url% devstack -b "%devstack_branch%"
cd devstack
if [[ -n "%admin_pass%" ]]; then
echo "ADMIN_PASSWORD=%admin_pass%" >> localrc
fi
if [[ -n "%mysql_pass%" ]]; then
echo "MYSQL_PASSWORD=%mysql_pass%" >> localrc
fi
if [[ -n "%rabbit_pass%" ]]; then
echo "RABBIT_PASSWORD=%rabbit_pass%" >> localrc
fi
if [[ -n "%service_pass%" ]]; then
echo "SERVICE_PASSWORD=%service_pass%" >> localrc
fi
echo "SERVICE_TOKEN=$(openssl rand -hex 10)" >> localrc
echo "SWIFT_HASH=$(openssl rand -hex 10)" >> localrc
echo "ENABLED_SERVICES=c-api,c-bak,c-sch,c-vol,cinder,dstat,g-api,g-reg,h-api,h-api-cfn,h-api-cw,h-eng,heat,key,mysql,n-api,n-cond,n-cpu,n-crt,n-obj,n-sch,q-agt,q-dhcp,q-fwaas,q-l3,q-lbaas,q-meta,q-metering,q-svc,q-vpn,quantum,rabbit,s-account,s-container,s-object,s-proxy" >> localrc
echo "" >> localrc
#echo "PUBLIC_INTERFACE=$(route | grep default | grep -oE "bond.*$")" >> localrc
echo "VOLUME_BACKING_FILE_SIZE=50G" >> localrc
echo "SWIFT_LOOPBACK_DISK_SIZE=6G" >> localrc
echo "SWIFT_MAX_FILE_SIZE=5368709122" >> localrc
echo "enable_plugin neutron-lbaas https://github.com/openstack/neutron-lbaas.git stable/newton" >> localrc
echo "enable_plugin octavia https://github.com/openstack/octavia.git stable/newton" >> localrc
echo "ENABLED_SERVICES+=,q-lbaasv2" >> localrc
echo "ENABLED_SERVICES+=,octavia,o-cw,o-hk,o-hm,o-api" >> localrc
sudo -i -u stack bash -c "~stack/devstack/stack.sh"
echo "SUCCESS!"
EOF
# Allow access to Horizon
iptables -I INPUT -p tcp --dport 80 -j ACCEPT
# Disable requiretty in /etc/sudoers so sudo command below will work
sed -i 's/\(Defaults.*requiretty\)/#\1/' /etc/sudoers
#
# Pre-generate octavia image because it keeps throwing an error during devstack install
#
apt-get install -y qemu kpartx python-setuptools
easy_install pip
pip install diskimage-builder dib-utils
cd /opt/stack/
git clone -b stable/newton https://github.com/openstack/octavia.git
git clone -b stable/newton https://github.com/openstack/tripleo-image-elements.git
git clone https://github.com/openstack/diskimage-builder.git
cd ~
bash -x /opt/stack/octavia/diskimage-create/diskimage-create.sh -s 2 -o /opt/stack/octavia/diskimage-create/amphora-x64-haproxy.qcow2
#
# Install devstack (takes ~15-20 minutes)
#
chmod +x ~stack/install-devstack.sh
sudo -u stack ~stack/install-devstack.sh
ssh-keygen -f /root/.ssh/id_rsa -t rsa -N ''
# Add the key to nova
set +x
. /opt/stack/devstack/openrc admin admin
set -x
nova keypair-add default --pub-key /root/.ssh/id_rsa.pub
#
# Pull fedora cloud image and add it to glance
#
curl -L https://download.fedoraproject.org/pub/fedora/linux/releases/25/CloudImages/x86_64/images/Fedora-Cloud-Base-25-1.3.x86_64.qcow2 -o fedora25.qcow2
glance image-create --name fedora25 --visibility public --disk-format=qcow2 --container-format=bare --file=fedora25.qcow2
#
# Install docker
#
apt-get remove docker docker-engine
apt-get update
apt-get install -y linux-image-extra-$(uname -r) \
linux-image-extra-virtual
apt-get install -y apt-transport-https \
ca-certificates \
curl \
software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
apt-key fingerprint 0EBFCD88
add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) \
stable"
apt-get update
apt-get install -y docker-ce
#
# Write template and scripts
#
cat>tlsserver_stack.yaml<<'EOTEMPLATE'
heat_template_version: 2014-10-16
parameters:
ssh_key_name:
type: string
default: default
external_network:
type: string
description: uuid/name of a network to use for floating ip addresses
default: public
server_image:
type: string
description: glance image used to boot the server
default: fedora25
master_flavor:
type: string
default: ds2G
description: flavor to use when booting the server for master nodes
fixed_network_cidr:
type: string
description: network range for fixed ip network
default: 10.0.0.0/24
portal_network_cidr:
type: string
description: >
address range used by kubernetes for service portals
default: 10.254.0.0/16
loadbalancing_protocol:
type: string
default: HTTPS
resources:
fixed_network:
type: OS::Neutron::Net
properties:
name: private
fixed_subnet:
type: OS::Neutron::Subnet
properties:
cidr: {get_param: fixed_network_cidr}
network: {get_resource: fixed_network}
dns_nameservers:
- 8.8.8.8
extrouter:
type: OS::Neutron::Router
properties:
external_gateway_info:
network: {get_param: external_network}
extrouter_inside:
type: OS::Neutron::RouterInterface
properties:
router_id: {get_resource: extrouter}
subnet: {get_resource: fixed_subnet}
secgroup:
type: OS::Neutron::SecurityGroup
properties:
rules:
- protocol: icmp
- protocol: tcp
port_range_min: 22
port_range_max: 22
- protocol: tcp
port_range_min: 443
port_range_max: 443
lb:
type: OS::Neutron::LBaaS::LoadBalancer
properties:
vip_subnet: {get_resource: fixed_subnet}
api_listener:
type: OS::Neutron::LBaaS::Listener
properties:
loadbalancer: {get_resource: lb}
protocol: {get_param: loadbalancing_protocol}
protocol_port: 443
api_pool:
type: OS::Neutron::LBaaS::Pool
properties:
lb_algorithm: ROUND_ROBIN
listener: {get_resource: api_listener}
protocol: {get_param: loadbalancing_protocol}
lb_floating:
type: OS::Neutron::FloatingIP
depends_on:
- extrouter_inside
properties:
floating_network: {get_param: external_network}
port_id: {get_attr: [lb, vip_port_id]}
node_floating:
type: OS::Neutron::FloatingIP
depends_on:
- extrouter_inside
properties:
floating_network: {get_param: external_network}
port_id: {get_resource: node_eth0}
node:
type: OS::Nova::Server
depends_on:
- extrouter_inside
properties:
name:
str_replace:
template: "stack"
params:
stack: {get_param: "OS::stack_name"}
image: {get_param: server_image}
flavor: ds2G
key_name: {get_param: ssh_key_name}
user_data_format: RAW
user_data: {get_resource: node_init}
networks:
- port: {get_resource: node_eth0}
node_eth0:
type: OS::Neutron::Port
properties:
network: {get_resource: fixed_network}
security_groups:
- {get_resource: secgroup}
fixed_ips:
- subnet: {get_resource: fixed_subnet}
allowed_address_pairs:
- ip_address: {get_param: portal_network_cidr}
replacement_policy: AUTO
api_pool_member:
type: OS::Neutron::LBaaS::PoolMember
properties:
pool: {get_resource: api_pool}
address: {get_attr: [node_eth0, fixed_ips, 0, ip_address]}
subnet: {get_resource: fixed_subnet}
protocol_port: 443
node_init:
type: OS::Heat::SoftwareConfig
properties:
group: ungrouped
config:
str_replace:
template: |
#!/bin/bash -x
systemctl disable firewalld.service
systemctl stop firewalld.service
#
# Set up CA
#
mkdir -p /home/fedora/ca
cd /home/fedora/ca
CAROOT=$(pwd)
mkdir -p ${CAROOT}/ca.db.certs # Signed certificates storage
touch ${CAROOT}/ca.db.index # Index of signed certificates
echo 01 > ${CAROOT}/ca.db.serial # Next (sequential) serial number
# Configuration
cat>${CAROOT}/ca.conf<<'EOF'
[ ca ]
default_ca = ca_default
[ ca_default ]
dir = REPLACE_LATER
certs = $dir
new_certs_dir = $dir/ca.db.certs
database = $dir/ca.db.index
serial = $dir/ca.db.serial
RANDFILE = $dir/ca.db.rand
certificate = $dir/ca.crt
private_key = $dir/ca.key
default_days = 365
default_crl_days = 30
default_md = md5
preserve = no
policy = generic_policy
[ generic_policy ]
countryName = optional
stateOrProvinceName = optional
localityName = optional
organizationName = optional
organizationalUnitName = optional
commonName = supplied
emailAddress = optional
EOF
sed -i "s|REPLACE_LATER|${CAROOT}|" ${CAROOT}/ca.conf
cd ${CAROOT}
dnf install -y openssl
# Generate CA private key
openssl genrsa -out ca.key 1024
# Create Certificate Signing Request
openssl req -new -key ca.key \
-subj "/C=XX/ST=X/L=X/O=X/CN=X" \
-out ca.csr
# Create self-signed certificate
openssl x509 -req -days 10000 \
-in ca.csr \
-out ca.crt \
-signkey ca.key
#
# Generate server cert
#
cd /home/fedora
cat>/home/fedora/server.conf<<'EOF'
[req]
distinguished_name = req_distinguished_name
req_extensions = req_ext
prompt = no
[req_distinguished_name]
CN = test
[req_ext]
subjectAltName = IP:LB_FIP_IP,IP:LB_IP,IP:NODE_FIP_IP,IP:NODE_IP,IP:127.0.0.1
extendedKeyUsage = clientAuth,serverAuth
EOF
openssl genrsa -out "server.key" 4096
chmod 400 server.key
openssl req -new -days 1000 -key "server.key" -out "server.csr" \
-reqexts req_ext -config "server.conf"
yes | openssl ca -config ca/ca.conf -in server.csr -cert ca/ca.crt \
-keyfile ca/ca.key -out server.crt
echo "LB STACK COMPLETE"
params:
LB_FIP_IP: {get_attr: [lb, vip_address]}
LB_IP: {get_attr: [lb_floating, floating_ip_address]}
NODE_FIP_IP: {get_attr: [node_floating, floating_ip_address]}
NODE_IP: {get_attr: [node_eth0, fixed_ips, 0, ip_address]}
outputs:
lb_fip_ip:
value: {get_attr: [lb_floating, floating_ip_address]}
node_fip_ip:
value: {get_attr: [node_floating, floating_ip_address]}
EOTEMPLATE
cat>Dockerfile<<'EOF'
# latest possible stable go for bootstrapping new go (or just ":latest")
FROM golang:1.6.3
# SHA of commit to build
ENV GOLANG_BUILD_SHA WILL_BE_REPLACED
# Last stable version prior this commit
ENV GOLANG_BASE_VERSION 1.6.3
ENV GOLANG_BUILD_VERSION $GOLANG_BASE_VERSION-nightly-$GOLANG_BUILD_SHA
# gcc for cgo
RUN apt-get update && apt-get install -y --no-install-recommends \
g++ \
gcc \
libc6-dev \
make \
&& rm -rf /var/lib/apt/lists/*
ENV GOLANG_DOWNLOAD_URL https://github.com/golang/go/archive/$GOLANG_BUILD_SHA.tar.gz
ENV GOSRC /usr/local/go-$GOLANG_BUILD_SHA
ENV GOROOT $GOSRC
ENV GOPATH /go
ENV GOROOT_BOOTSTRAP /usr/local/go
ENV GOBUILD $GOSRC/src
RUN curl -fsSL "$GOLANG_DOWNLOAD_URL" \
| tar -C /usr/local -xz
RUN echo $GOLANG_BUILD_VERSION > "$GOROOT/VERSION"
WORKDIR $GOBUILD
RUN ./make.bash
# let new built go take precendence over old go used for bootstrapping it
ENV PATH=$GOSRC/bin:$PATH
EOF
cat>build-nightly-image.sh<<'EOF'
#!/bin/bash -x
SHA=$1
NIGHTLY_IMAGE=golang:$SHA
sed -i '/^ENV GOLANG_BUILD_SHA/ s/GOLANG_BUILD_SHA .*/GOLANG_BUILD_SHA '$SHA'/' Dockerfile
docker build . -t $NIGHTLY_IMAGE
EOF
cat>compile-tlsserver.sh<<'EOF'
#!/bin/bash -x
SHA=$1
sudo docker run --rm -v $(pwd):/go/src/tlstls -v $(pwd):/go/bin/linux_amd64 -e GOOS -e GOARCH \
golang:$SHA \
go build -o /go/bin/linux_amd64/tlsserver_$SHA tlstls/...
EOF
#
# THIS IS THE TLS SERVER FILE
#
cat>tlsserver.go<<'EOF'
package main
import (
"crypto/tls"
"fmt"
"log"
"net/http"
)
func main() {
tlsconfig := &tls.Config{
MinVersion: tls.VersionTLS12,
NextProtos: []string{"h2"},
}
http.HandleFunc("/", serve)
server := &http.Server{
Addr: "0.0.0.0:443",
//Handler: serve,
MaxHeaderBytes: 1 << 20,
TLSConfig: tlsconfig,
}
err := server.ListenAndServeTLS("server.crt", "server.key")
if err != nil {
log.Fatal(err)
}
}
func serve(w http.ResponseWriter, req *http.Request) {
w.Write([]byte(`Hello, world!`))
fmt.Println("Sent Hello, world!")
}
EOF
#
# Build golang image and compile tls server binaries for both SHAs
#
STACK_NAME=tls3
GOOD_SHA=f6c0241999bffe0fe52e8b7f5bbcc8f9e02edbdf
BAD_SHA=2a8c81ffaadc69add6ff85b241691adb7f9f24ff
chmod +x build-nightly-image.sh
chmod +x compile-tlsserver.sh
./build-nightly-image.sh $GOOD_SHA
./compile-tlsserver.sh $GOOD_SHA
./build-nightly-image.sh $BAD_SHA
./compile-tlsserver.sh $BAD_SHA
#
# Create Heat stack that includes the LB and node to run the server
#
heat stack-create -f tlsserver_stack.yaml $STACK_NAME
echo "EVERYTHING FINISHED"
params:
"%mysql_pass%": { get_resource: mysql_pass }
"%admin_pass%": { get_resource: admin_pass }
"%rabbit_pass%": { get_resource: rabbit_pass }
"%service_pass%": { get_resource: service_pass }
"%devstack_url%": { get_param: devstack_url }
"%devstack_branch%": { get_param: devstack_branch }
outputs:
horizon_url:
value:
str_replace:
template: "http://%server-ip%"
params:
"%server-ip%": { get_attr: [ devstack_server, accessIPv4 ] }
description: The Horizon web control panel URL of your devstack server
server_ip:
value: { get_attr: [ devstack_server, accessIPv4 ] }
description: server ip
ssh_key_public:
value: { get_attr: [ssh_key, public_key] }
description: SSH public key
ssh_key_private:
value: { get_attr: [ssh_key, private_key] }
description: SSH private key
@bradfitz It's ready.
@dragorosson, that's a repro I guess, but it's far from a minimal repro. I don't imagine @agl or I with our limited time will have an hour+ just to do that install. Is there a lighter weight repro? It seems like reproducing it is a good first step, but can you narrow it down now? Or at least get it down to a Dockerfile we can pull and run, without using OpenStack or Rackspace public cloud?
Timed out in state WaitingForInfo. Closing.
(I am just a bot, though. Please speak up if this is a mistake or you have the requested information.)
For more context, see the original Kubernetes bug report: https://github.com/kubernetes/kubernetes/issues/40850
What version of Go are you using (
go version
)?https://github.com/golang/go/commit/2a8c81ffaadc69add6ff85b241691adb7f9f24ff (1.7 minus approximately 250 commits)
What operating system and processor architecture are you using (
go env
)?The kubernetes apiserver is running in a container.
uname -a
in the container reportsLinux {hostname} 4.9.7-101.fc24.x86_64 #1 SMP Thu Feb 2 23:32:31 UTC 2017 x86_64 GNU/Linux
What did you do? / What did you expect to see? / What did you see instead?
The setup requires a bit of explanation. I'm running a Kubernetes cluster with TLS enabled with an HAProxy LB in front of it on a separate node with a floating IP attached to it.
When trying to contact the kube-apiserver using
curl -vv -k https://{floating IP}:{k8s api port}
, it hangs at the client hello:Only this specific combination exhibits the problem. That is:
Things that, interestingly, do work (all done separately):
openssl s_server
allows full two-way communicationI am hoping that by submitting this report here, I can gain some additional insight into what the problem could be, especially because I have pinned https://github.com/golang/go/commit/2a8c81ffaadc69add6ff85b241691adb7f9f24ff as the breaking commit.
P.S. If anyone has or could whip up a simple Go program that only listens on a port and establishes a TLS connection (two versions; at https://github.com/golang/go/commit/2a8c81ffaadc69add6ff85b241691adb7f9f24ff and the commit before it), I could fully eliminate kubernetes from the equation. I'll try to work on it myself, but I haven't written a line of Go in my life. It would be super helpful!
HAProxy config:
(HA-Proxy version 1.5.14 2015/07/02)
Some HAProxy logs (more at https://gist.github.com/dragorosson/9843b863e77b316ea4128f0ee1661c73):