flatcar / Flatcar

Flatcar project repository for issue tracking, project documentation, etc.
https://www.flatcar.org/
Apache License 2.0
689 stars 30 forks source link

Unstable get cloud metadata from "magic" IP #1541

Open lmq1999 opened 2 weeks ago

lmq1999 commented 2 weeks ago

Description

Unstable to get metadata via curl http://169.254.169.254/openstack

Impact

Without metadata from cloud, I can't run some service and kubernetes CSI

Environment and steps to reproduce

  1. Set-up: Flatcar image: flatcar_production_openstack_image.img
    pool-g4dzrku5-sj3dtqihuu6cjof6-node-hshs3fbx ~ # cat /etc/os-release
    NAME="Flatcar Container Linux by Kinvolk"
    ID=flatcar
    ID_LIKE=coreos
    VERSION=3975.2.0
    VERSION_ID=3975.2.0
    BUILD_ID=2024-08-05-2103
    SYSEXT_LEVEL=1.0
    PRETTY_NAME="Flatcar Container Linux by Kinvolk 3975.2.0 (Oklo)"
    ANSI_COLOR="38;5;75"
    HOME_URL="https://flatcar.org/"
    BUG_REPORT_URL="https://issues.flatcar.org"
    FLATCAR_BOARD="amd64-usr"
    CPE_NAME="cpe:2.3:o:flatcar-linux:flatcar_linux:3975.2.0:*:*:*:*:*:*:*"
  2. Task: Setup network config for vpn interface and loopback interface
    
    pool-x45g4eed-x9ipxe2qz7c3offo-node-oes7cdpq /home/core # cat /etc/systemd/network
    network/       networkd.conf  
    pool-x45g4eed-x9ipxe2qz7c3offo-node-oes7cdpq /home/core # cat /etc/systemd/network/
    .keep_sys-apps_systemd-0  kengine.network           lo.network                
    pool-x45g4eed-x9ipxe2qz7c3offo-node-oes7cdpq /home/core # cat /etc/systemd/network/kengine.network 
    [Match]
    Name=kengine

[Link] Unmanaged=yes pool-x45g4eed-x9ipxe2qz7c3offo-node-oes7cdpq /home/core # cat /etc/systemd/network/lo.network
[Match] Name=lo

[Network] Address=127.0.0.1/8 Address=10.93.0.1/32 pool-x45g4eed-x9ipxe2qz7c3offo-node-oes7cdpq /home/core #


3. **Action(s)**:
Restart networkd and try to get metadata

pool-x45g4eed-x9ipxe2qz7c3offo-node-oes7cdpq /home/core # systemctl restart systemd-networkd pool-x45g4eed-x9ipxe2qz7c3offo-node-oes7cdpq /home/core # curl http://169.254.169.254/openstack 2012-08-10 2013-04-04 2013-10-17 2015-10-15 2016-06-30 2016-10-06 2017-02-22 2018-08-27 latestpool-x45g4eed-x9ipxe2qz7c3offo-node-oes7cdpq /home/core # curl http://169.254.169.254/openstack 2012-08-10 2013-04-04 2013-10-17 2015-10-15 2016-06-30 2016-10-06 2017-02-22 2018-08-27 latestpool-x45g4eed-x9ipxe2qz7c3offo-node-oes7cdpq /home/core # curl http://169.254.169.254/openstack curl: (28) Failed to connect to 169.254.169.254 port 80 after 135318 ms: Couldn't connect to server pool-x45g4eed-x9ipxe2qz7c3offo-node-oes7cdpq /home/core # curl http://169.254.169.254/openstack curl: (28) Failed to connect to 169.254.169.254 port 80 after 134717 ms: Couldn't connect to server


After restart networkd again

latestpool-x45g4eed-x9ipxe2qz7c3offo-node-oes7cdpq /home/core # curl http://169.254.169.254/openstack curl: (28) Failed to connect to 169.254.169.254 port 80 after 135318 ms: Couldn't connect to server pool-x45g4eed-x9ipxe2qz7c3offo-node-oes7cdpq /home/core # curl http://169.254.169.254/openstack curl: (28) Failed to connect to 169.254.169.254 port 80 after 134717 ms: Couldn't connect to server pool-x45g4eed-x9ipxe2qz7c3offo-node-oes7cdpq /home/core # curl http://169.254.169.254/openstack curl: (28) Failed to connect to 169.254.169.254 port 80 after 134778 ms: Couldn't connect to server pool-x45g4eed-x9ipxe2qz7c3offo-node-oes7cdpq /home/core # systemctl restart systemd-networkd pool-x45g4eed-x9ipxe2qz7c3offo-node-oes7cdpq /home/core # curl http://169.254.169.254/openstack 2012-08-10 2013-04-04 2013-10-17 2015-10-15 2016-06-30 2016-10-06 2017-02-22 2018-08-27 latestpool-x45g4eed-x9ipxe2qz7c3offo-node-oes7cdpq /home/core # curl http://169.254.169.254/openstack 2012-08-10 2013-04-04 2013-10-17 2015-10-15 2016-06-30 2016-10-06 2017-02-22 2018-08-27 latestpool-x45g4eed-x9ipxe2qz7c3offo-node-oes7cdpq /home/core # curl http://169.254.169.254/openstack 2012-08-10 2013-04-04 2013-10-17 2015-10-15 2016-06-30 2016-10-06 2017-02-22 2018-08-27


4. **Error**: 
curl: (28) Failed to connect to 169.254.169.254 port 80 after 134778 ms: Couldn't connect to server

## Expected behavior

Can get metadata like this:

curl http://169.254.169.254/openstack 2012-08-10 2013-04-04 2013-10-17 2015-10-15 2016-06-30 2016-10-06 2017-02-22 2018-08-27 latest


## Additional information

Ip route

pool-x45g4eed-x9ipxe2qz7c3offo-node-oes7cdpq /home/core # ip r default via 103.107.182.1 dev eth0 proto dhcp src 103.107.182.231 metric 1024 10.20.4.0/24 dev eth1 proto kernel scope link src 10.20.4.231 metric 1024 10.20.4.3 dev eth1 proto dhcp scope link src 10.20.4.231 metric 1024 10.200.9.0/24 via 10.20.4.129 dev eth1 proto kernel 10.200.16.0/24 via 10.20.4.137 dev eth1 proto kernel 10.200.71.0/24 via 10.200.71.46 dev cilium_host proto kernel src 10.200.71.46 10.200.71.46 dev cilium_host proto kernel scope link 10.200.75.0/24 via 10.20.4.77 dev eth1 proto kernel 10.200.79.0/24 via 10.20.4.136 dev eth1 proto kernel 10.200.85.0/24 via 10.20.4.186 dev eth1 proto kernel 10.200.86.0/24 via 10.20.4.8 dev eth1 proto kernel 10.200.88.0/24 via 10.20.4.124 dev eth1 proto kernel 10.200.90.0/24 via 10.20.4.81 dev eth1 proto kernel 103.107.182.0/24 dev eth0 proto kernel scope link src 103.107.182.231 metric 1024 103.107.182.1 dev eth0 proto dhcp scope link src 103.107.182.231 metric 1024 103.107.182.7 dev eth0 proto dhcp scope link src 103.107.182.231 metric 1024 169.254.169.254 via 103.107.182.7 dev eth0 proto dhcp src 103.107.182.231 metric 1024 169.254.169.254 via 10.20.4.3 dev eth1 proto dhcp src 10.20.4.231 metric 1024 172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown


networkctl list

pool-x45g4eed-x9ipxe2qz7c3offo-node-oes7cdpq /home/core # networkctl list
IDX LINK TYPE OPERATIONAL SETUP
1 lo loopback routable configured 2 eth0 ether routable configured 3 eth1 ether routable configured 4 docker0 bridge no-carrier unmanaged 7 cilium_net ether degraded unmanaged 8 cilium_host ether routable unmanaged 10 lxc_health ether degraded unmanaged

7 links listed.



 bonus: I also use docker openvpn config from this issue: https://github.com/flatcar/Flatcar/issues/1515
tormath1 commented 2 weeks ago

Hello and thanks for the report, we will need some additional information (could use a gist for the asked logs to avoid loading this issue ?):

Can you confirm that you can ping the metadata server with both interfaces? (eth0 and eth1) and that you don't have any security group preventing access to resources on port 80?

lmq1999 commented 2 weeks ago

journalctl: https://gist.github.com/lmq1999/6b36af7053f026988fbf59e08e3c2510 dmesg: https://gist.github.com/lmq1999/7dbaa7b0827e59143d868b4e8fd0ddee curl:

pool-x45g4eed-x9ipxe2qz7c3offo-node-msmsjtx5 /home/core # curl -vvvvv http://169.254.169.254/openstack
*   Trying 169.254.169.254:80...
* connect to 169.254.169.254 port 80 from 103.148.57.65 port 38364 failed: Connection timed out
* Failed to connect to 169.254.169.254 port 80 after 134043 ms: Couldn't connect to server
* Closing connection
curl: (28) Failed to connect to 169.254.169.254 port 80 after 134043 ms: Couldn't connect to server

Do you see the same effect with other IPs / URLs or only the OpenStack one => Just Openstack metadata IP What is your OpenStack deployment => managed by cloud provider (Bizflycloud), already contact the admin but they found no clue If you have another OS available can you try to reproduce? => I dont have problem with Ubuntu 20

Yes I can confirm I can get to metadata server with both interface

I think the problems somewhere in the docker openvpn i mention above


client
dev kengine
dev-type tap
reneg-sec 0
proto tcp-client
remote 123.31.11.151 10018
resolv-retry infinite
nobind
<ca>
-----BEGIN CERTIFICATE-----
-----END CERTIFICATE-----

</ca>
<key>
-----BEGIN PRIVATE KEY-----
-----END PRIVATE KEY-----
</key>
<cert>
-----BEGIN CERTIFICATE-----
-----END CERTIFICATE-----

</cert>
<tls-auth>
#
# 2048 bit OpenVPN static key
#
-----BEGIN OpenVPN Static key V1-----
-----END OpenVPN Static key V1-----

</tls-auth>
remote-cert-tls server
key-direction 1
script-security 3
keepalive 10 60
persist-key
persist-tun
comp-lzo
verb 3

route-nopull
pull-filter ignore "route-gateway"

If I don't add these config

route-nopull
pull-filter ignore "route-gateway"

lt will add wrong gateway route and this node unable access the internet (but can get the metadata) but if I add these 2 lines in, I can access the internet but not the metadata when the container run

docker vpn command: openvpn --cd /vpn --config /vpn/kengine.conf --script-security 2 --redirect-gateway def1