chaos-mesh / chaos-mesh

A Chaos Engineering Platform for Kubernetes.
https://chaos-mesh.org
Apache License 2.0
6.76k stars 835 forks source link

[HttpChaos] is not working propertly #4313

Open see-quick opened 10 months ago

see-quick commented 10 months ago

Bug Report

When playing with ChaosMesh I discovered a very hidden problem within HttpChaos. With the following chaos experiment:

apiVersion: chaos-mesh.org/v1alpha1
kind: HTTPChaos
metadata:
  name: test-http-chaos-3
spec:
  mode: all
  selector:
    labelSelectors:
      strimzi.io/kind: KafkaBridge
  target: Response
  port: 8080
  method: POST
  path: /topics/*
  abort: true
  duration: 5m

I saw the following problem inside the Daemon instance:

2024-01-14T20:16:16.878662Z DEBUG handle{proxy=Proxy { opt: ProxyOpt { ipc_path: "/tmp/5e0e231a-3802-4eee-bd94-a1bce69fb85c.sock", verbose: 2 }, net_env: NetEnv { netns: "6ca59abc-0092ns", device: "eth0", ip: "10.129.2.106/23", bridge1: "6ca59abc-0092b1", bridge2: "6ca59abc-0092b2", veth1: "6ca59abc-0092v1", veth2: "veth0", veth3: "veth1", veth4: "6ca59abc-0092v4", save_routes: [RouteMessage { header: RouteHeader { address_family: 2, destination_prefix_length: 4, source_prefix_length: 0, tos: 0, table: 254, protocol: 3, scope: 0, kind: 1, flags: (empty) }, nlas: [Table(254), Destination([224, 0, 0, 0]), Oif(3)] }, RouteMessage { header: RouteHeader { address_family: 2, destination_prefix_length: 16, source_prefix_length: 0, tos: 0, table: 254, protocol: 3, scope: 0, kind: 1, flags: (empty) }, nlas: [Table(254), Destination([172, 30, 0, 0]), Gateway([10, 129, 2, 1]), Oif(3)] }, RouteMessage { header: RouteHeader { address_family: 2, destination_prefix_length: 23, source_prefix_length: 0, tos: 0, table: 254, protocol: 2, scope: 253, kind: 1, flags: (empty) }, nlas: [Table(254), Destination([10, 129, 2, 0]), PrefSource([10, 129, 2, 106]), Oif(3)] }, RouteMessage { header: RouteHeader { address_family: 2, destination_prefix_length: 14, source_prefix_length: 0, tos: 0, table: 254, protocol: 3, scope: 0, kind: 1, flags: (empty) }, nlas: [Table(254), Destination([10, 128, 0, 0]), Oif(3)] }, RouteMessage { header: RouteHeader { address_family: 2, destination_prefix_length: 0, source_prefix_length: 0, tos: 0, table: 254, protocol: 3, scope: 0, kind: 1, flags: (empty) }, nlas: [Table(254), Gateway([10, 129, 2, 1]), Oif(3)] }] }, rtnl_handle: Handle(ConnectionHandle { requests_tx: UnboundedSender(Some(UnboundedSenderInner { inner: UnboundedInner { state: 9223372036854775808, message_queue: Queue { head: 0x55fd617bb990, tail: UnsafeCell { .. } }, num_senders: 1, recv_task: AtomicWaker } })) }), sender: Some(Sender { inner: Some(Inner { state: State { is_complete: false, is_closed: false, is_rx_task_set: false, is_tx_task_set: false } }) }), rx: Some(Receiver { inner: Some(Inner { state: State { is_complete: false, is_closed: false, is_rx_task_set: false, is_tx_task_set: false } }) }), task: None } request=Request { method: PUT, uri: /, version: HTTP/1.1, headers: {"host": "", "user-agent": "Go-http-client/1.1", "content-length": "132"}, body: Body(Streaming) }}: chaos_tproxy::proxy::net::bridge: stderr : The kernel doesn't support the ebtables 'nat' table.

2024-01-14T20:16:16.878932Z ERROR handle{proxy=Proxy { opt: ProxyOpt { ipc_path: "/tmp/5e0e231a-3802-4eee-bd94-a1bce69fb85c.sock", verbose: 2 }, net_env: NetEnv { netns: "6ca59abc-0092ns", device: "eth0", ip: "10.129.2.106/23", bridge1: "6ca59abc-0092b1", bridge2: "6ca59abc-0092b2", veth1: "6ca59abc-0092v1", veth2: "veth0", veth3: "veth1", veth4: "6ca59abc-0092v4", save_routes: [RouteMessage { header: RouteHeader { address_family: 2, destination_prefix_length: 4, source_prefix_length: 0, tos: 0, table: 254, protocol: 3, scope: 0, kind: 1, flags: (empty) }, nlas: [Table(254), Destination([224, 0, 0, 0]), Oif(3)] }, RouteMessage { header: RouteHeader { address_family: 2, destination_prefix_length: 16, source_prefix_length: 0, tos: 0, table: 254, protocol: 3, scope: 0, kind: 1, flags: (empty) }, nlas: [Table(254), Destination([172, 30, 0, 0]), Gateway([10, 129, 2, 1]), Oif(3)] }, RouteMessage { header: RouteHeader { address_family: 2, destination_prefix_length: 23, source_prefix_length: 0, tos: 0, table: 254, protocol: 2, scope: 253, kind: 1, flags: (empty) }, nlas: [Table(254), Destination([10, 129, 2, 0]), PrefSource([10, 129, 2, 106]), Oif(3)] }, RouteMessage { header: RouteHeader { address_family: 2, destination_prefix_length: 14, source_prefix_length: 0, tos: 0, table: 254, protocol: 3, scope: 0, kind: 1, flags: (empty) }, nlas: [Table(254), Destination([10, 128, 0, 0]), Oif(3)] }, RouteMessage { header: RouteHeader { address_family: 2, destination_prefix_length: 0, source_prefix_length: 0, tos: 0, table: 254, protocol: 3, scope: 0, kind: 1, flags: (empty) }, nlas: [Table(254), Gateway([10, 129, 2, 1]), Oif(3)] }] }, rtnl_handle: Handle(ConnectionHandle { requests_tx: UnboundedSender(Some(UnboundedSenderInner { inner: UnboundedInner { state: 9223372036854775808, message_queue: Queue { head: 0x55fd617bb990, tail: UnsafeCell { .. } }, num_senders: 1, recv_task: AtomicWaker } })) }), sender: Some(Sender { inner: Some(Inner { state: State { is_complete: false, is_closed: false, is_rx_task_set: false, is_tx_task_set: false } }) }), rx: Some(Receiver { inner: Some(Inner { state: State { is_complete: false, is_closed: false, is_rx_task_set: false, is_tx_task_set: false } }) }), task: None } request=Request { method: PUT, uri: /, version: HTTP/1.1, headers: {"host": "", "user-agent": "Go-http-client/1.1", "content-length": "132"}, body: Body(Streaming) }}: chaos_tproxy::proxy::net::routes: can not recover ROUTE MSG: RouteMessage { header: RouteHeader { address_family: 2, destination_prefix_length: 16, source_prefix_length: 0, tos: 0, table: 254, protocol: 3, scope: 0, kind: 1, flags: (empty) }, nlas: [Table(254), Destination([172, 30, 0, 0]), Gateway([10, 129, 2, 1]), Oif(3)] }, error: Received a netlink error message Network is unreachable (os error 101)
2024-01-14T20:16:16.879420Z ERROR chaos_tproxy::cmd::interactive::handler: error from user's Service: stderr : The kernel doesn't support the ebtables 'broute' table.

2024-01-14T20:18:14.256Z    INFO    chaos-daemon.daemon-server  chaosdaemon/server.go:187   container GetPid    {"request": "action:{action:GETPID} container_id:\"cri-o://7e432355903da652ed2c27ab7bad72968e172f5948a1924017c2a14e84ae06be\""}
2024-01-14T20:18:14.258Z    INFO    chaos-daemon.daemon-server  chaosdaemon/server.go:187   container GetPid    {"request": "action:{action:GETPID} container_id:\"cri-o://7e432355903da652ed2c27ab7bad72968e172f5948a1924017c2a14e84ae06be\""}
2024-01-14T20:18:14.347Z    INFO    chaos-daemon.daemon-server  chaosdaemon/server.go:187   container GetPid    {"request": "action:{action:GETPID} container_id:\"cri-o://7e432355903da652ed2c27ab7bad72968e172f5948a1924017c2a14e84ae06be\""}

The important part is

chaos_tproxy::proxy::net::bridge: stderr : The kernel doesn't support the ebtables 'nat' table.

Solution

So I rsh to the daemon and tried

# ebtables -L
The kernel doesn't support the ebtables 'filter' table.

and afterwards I have

modprobe ebtables

which resulted in positive behavior and such problem was resolved i.e.,

# ebtables -L
Bridge table: filter

Bridge chain: INPUT, entries: 0, policy: ACCEPT

Bridge chain: FORWARD, entries: 0, policy: ACCEPT

Bridge chain: OUTPUT, entries: 0, policy: ACCEPT

So I think the correct behaviour should be that inside Daemon instances is executed such command during installation (i.e., modprobe ebtables) :)

Output of chaosctl

chaosctl debug httpchaos test-http-chaos-3 -n myproject
[Chaos]: test-http-chaos-3

[Pod]: my-bridge-bridge-74f48855d9-994f5

    1. [iptables list]

    Chain INPUT (policy ACCEPT)
    target     prot opt source               destination

    Chain FORWARD (policy ACCEPT)
    target     prot opt source               destination

    Chain OUTPUT (policy ACCEPT)
    target     prot opt source               destination

    2. [file descriptors of PID: 139, COMMAND: tproxy]

    0 -> pipe:[54660192]
    1 -> pipe:[54660193]
    10 -> socket:[54662151]
    11 -> socket:[54662153]
    12 -> socket:[54659356]
    13 -> socket:[54652788]
    2 -> pipe:[40173403]
    3 -> net:[4026533138]
    4 -> pid:[4026533199]
    5 -> anon_inode:[eventpoll]
    6 -> anon_inode:[eventfd]
    7 -> anon_inode:[eventpoll]
    8 -> socket:[54662151]
    9 -> socket:[54662152]

    3. [podhttpchaos]

    {
      "rules": [
        {
          "target": "Request",
          "selector": {
            "port": 8080,
            "path": "/api",
            "method": "GET"
          },
          "actions": {
            "abort": true
          },
          "source": "myproject/test-http-chaos",
          "port": 8080
        },
        {
          "target": "Request",
          "selector": {
            "port": 8080,
            "path": "/topics/my-topic",
            "method": "POST"
          },
          "actions": {
            "abort": true
          },
          "source": "myproject/test-http-chaos-2",
          "port": 8080
        },
        {
          "target": "Response",
          "selector": {
            "port": 8080,
            "path": "/topics/*",
            "method": "POST"
          },
          "actions": {
            "abort": true
          },
          "source": "myproject/test-http-chaos-3",
          "port": 8080
        }
      ]
    }
STRRL commented 9 months ago

We suppose the kernel module ebtables should be load automatically, could you provide us more information about your OS like which linux distro.

In the other hand, we could execute modprobe ebtables before during injecting httpchaos in chaos-daemon. Would you like to help us to fix it?

see-quick commented 9 months ago

We suppose the kernel module ebtables should be load automatically, could you provide us more information about your OS like which linux distro.

❯ oc rsh -n chaos-mesh chaos-daemon-r7w95 
# cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

In the other hand, we could execute modprobe ebtables before during injecting httpchaos in chaos-daemon. Would you like to help us to fix it?

It would be great to do so. Because now I have to iterate over each Daemon instance and do that (i.e. modprobe ebtables) in my auxiliary script to fix it :)