aws / amazon-ecs-agent

Amazon Elastic Container Service Agent
http://aws.amazon.com/ecs/
Apache License 2.0
2.07k stars 608 forks source link

Adding start and stop black hole port fault implementation #4355

Closed mye956 closed 5 days ago

mye956 commented 1 week ago

Summary

This PR will introduce both start and stop network black hole port fault injection into the ecs-agent directory. It does so by making iptables commands via os/exec.

Implementation details

We will be adding two new functions, startNetworkBlackholePort() and stopNetworkBlackHolePort(), into the ecs-agent/tmds/handlers/fault/v1/handlers/handlers.go file.

Similar to CheckNetworkBlackHolePort(), both StartNetworkBlackholePort() and StopNetworkBlackHolePort() handler functions will also have the following checks before responding back to the request.:

Testing

Manual Testing: Hooked up the fault injection handlers to also register upon TMDS server start up, ran a AWSVPC task that calls all three BHP endpoints (start -> check status -> stop BHP fault)

level=debug time=2024-09-20T00:56:51Z msg="Handling http request" method="PUT" from="169.254.172.2:42200"
level=info time=2024-09-20T00:56:51Z msg="Received new request for request type: start network-blackhole-port" request="{\"Protocol\":\"tcp\",\"TrafficType\":\"egress\",\"Port\":1234}" requestType="start network-blackhole-port" tmdsEndpointContainerID="f4645575-7c7f-49b9-b605-38854d1f1775"
level=info time=2024-09-20T00:56:51Z msg="[INFO] Black hole port fault is not running" netns="/host/proc/25803/ns/net" command="nsenter --net=/host/proc/25803/ns/net iptables -C egress-tcp-1234 -p tcp --dport 1234 -j DROP" output="iptables: Bad rule (does a matching rule exist in that chain?).\n" exitCode=1
level=info time=2024-09-20T00:56:51Z msg="[INFO] Attempting to start network black hole port fault" netns="/host/proc/25803/ns/net" chain="egress-tcp-1234"
level=info time=2024-09-20T00:56:51Z msg="Successfully started fault" requestType="start network-blackhole-port" request="{\"Port\":1234,\"Protocol\":\"tcp\",\"TrafficType\":\"egress\"}" response="{\"Status\":\"running\"}"
level=debug time=2024-09-20T00:57:00Z msg="Storage stats not reported for container" module=utils_unix.go
level=debug time=2024-09-20T00:57:01Z msg="Handling http request" method="GET" from="169.254.172.2:59142"
level=info time=2024-09-20T00:57:01Z msg="Received new request for request type: check status network-blackhole-port" requestType="check status network-blackhole-port" tmdsEndpointContainerID="f4645575-7c7f-49b9-b605-38854d1f1775" request="{\"Protocol\":\"tcp\",\"TrafficType\":\"egress\",\"Port\":1234}"
level=debug time=2024-09-20T00:57:01Z msg="Successfully parsed fault request payload" request="{\"Port\":1234,\"Protocol\":\"tcp\",\"TrafficType\":\"egress\"}"
level=info time=2024-09-20T00:57:01Z msg="[INFO] Black hole port fault has been found running" netns="/host/proc/25803/ns/net" command="nsenter --net=/host/proc/25803/ns/net iptables -C egress-tcp-1234 -p tcp --dport 1234 -j DROP" output=""
level=info time=2024-09-20T00:57:01Z msg="[INFO] Successfully checked status for fault" requestType="check status network-blackhole-port" request="{\"Port\":1234,\"Protocol\":\"tcp\",\"TrafficType\":\"egress\"}" response="{\"Status\":\"running\"}"
level=debug time=2024-09-20T00:57:05Z msg="Received message of type: HeartbeatMessage"
level=debug time=2024-09-20T00:57:05Z msg="ACS activity occurred"
level=debug time=2024-09-20T00:57:05Z msg="Sending response to ACS" Name="heartbeat message responder" Response={
  MessageId: "fd8a0b80-f7e0-41a9-82bd-8d20450c03fa"
}
level=debug time=2024-09-20T00:58:01Z msg="Handling http request" method="DELETE" from="169.254.172.2:52668"
level=info time=2024-09-20T00:58:01Z msg="Received new request for request type: stop network-blackhole-port" request="{\"Protocol\":\"tcp\",\"TrafficType\":\"egress\",\"Port\":1234}" requestType="stop network-blackhole-port" tmdsEndpointContainerID="f4645575-7c7f-49b9-b605-38854d1f1775"
level=debug time=2024-09-20T00:58:01Z msg="Successfully parsed fault request payload" request="{\"Port\":1234,\"Protocol\":\"tcp\",\"TrafficType\":\"egress\"}"
level=info time=2024-09-20T00:58:01Z msg="[INFO] Black hole port fault has been found running" netns="/host/proc/25803/ns/net" command="nsenter --net=/host/proc/25803/ns/net iptables -C egress-tcp-1234 -p tcp --dport 1234 -j DROP" output=""
level=info time=2024-09-20T00:58:01Z msg="[INFO] Attempting to stop network black hole port fault" netns="/host/proc/25803/ns/net" chain="egress-tcp-1234"
level=info time=2024-09-20T00:58:01Z msg="Successfully stopped fault" request="{\"Port\":1234,\"Protocol\":\"tcp\",\"TrafficType\":\"egress\"}" response="{\"Status\":\"stopped\"}" requestType="stop network-blackhole-port"

Corresponding iptables output in task ENI/network namespace

[ec2-user@ip-172-31-25-237 amazon-ecs-agent]$ sudo nsenter --net=/proc/25803/ns/net iptables -nL
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
egress-tcp-1234  all  --  0.0.0.0/0            0.0.0.0/0           

Chain egress-tcp-1234 (1 references)
target     prot opt source               destination         
DROP       tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:1234
[ec2-user@ip-172-31-25-237 amazon-ecs-agent]$ sudo iptables -nL
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
DROP       tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:51678
DROP       all  -- !127.0.0.0/8          127.0.0.0/8          ! ctstate RELATED,ESTABLISHED,DNAT

Chain FORWARD (policy DROP)
target     prot opt source               destination         
DOCKER-USER  all  --  0.0.0.0/0            0.0.0.0/0           
DOCKER-ISOLATION-STAGE-1  all  --  0.0.0.0/0            0.0.0.0/0           
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
DOCKER     all  --  0.0.0.0/0            0.0.0.0/0           
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         

Chain DOCKER (1 references)
target     prot opt source               destination         

Chain DOCKER-ISOLATION-STAGE-1 (1 references)
target     prot opt source               destination         
DOCKER-ISOLATION-STAGE-2  all  --  0.0.0.0/0            0.0.0.0/0           
RETURN     all  --  0.0.0.0/0            0.0.0.0/0           

Chain DOCKER-ISOLATION-STAGE-2 (1 references)
target     prot opt source               destination         
DROP       all  --  0.0.0.0/0            0.0.0.0/0           
RETURN     all  --  0.0.0.0/0            0.0.0.0/0           

Chain DOCKER-USER (1 references)
target     prot opt source               destination         
RETURN     all  --  0.0.0.0/0            0.0.0.0/0           
[ec2-user@ip-172-31-25-237 amazon-ecs-agent]$ sudo nsenter --net=/proc/25803/ns/net iptables -nL
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
[ec2-user@ip-172-31-25-237 amazon-ecs-agent]$ sudo iptables -nL
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
DROP       tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:51678
DROP       all  -- !127.0.0.0/8          127.0.0.0/8          ! ctstate RELATED,ESTABLISHED,DNAT

Chain FORWARD (policy DROP)
target     prot opt source               destination         
DOCKER-USER  all  --  0.0.0.0/0            0.0.0.0/0           
DOCKER-ISOLATION-STAGE-1  all  --  0.0.0.0/0            0.0.0.0/0           
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
DOCKER     all  --  0.0.0.0/0            0.0.0.0/0           
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         

Chain DOCKER (1 references)
target     prot opt source               destination         

Chain DOCKER-ISOLATION-STAGE-1 (1 references)
target     prot opt source               destination         
DOCKER-ISOLATION-STAGE-2  all  --  0.0.0.0/0            0.0.0.0/0           
RETURN     all  --  0.0.0.0/0            0.0.0.0/0           

Chain DOCKER-ISOLATION-STAGE-2 (1 references)
target     prot opt source               destination         
DROP       all  --  0.0.0.0/0            0.0.0.0/0           
RETURN     all  --  0.0.0.0/0            0.0.0.0/0           

Chain DOCKER-USER (1 references)
target     prot opt source               destination         
RETURN     all  --  0.0.0.0/0            0.0.0.0/0           

Same test but using Host mode task

level=debug time=2024-09-20T01:01:33Z msg="Handling http request" method="PUT" from="172.31.25.237:38452"
level=info time=2024-09-20T01:01:33Z msg="Received new request for request type: start network-blackhole-port" request="{\"Protocol\":\"tcp\",\"TrafficType\":\"egress\",\"Port\":1234}" requestType="start network-blackhole-port" tmdsEndpointContainerID="64ca83af-f51b-4e66-acff-d0e3c29c1afc"
level=info time=2024-09-20T01:01:33Z msg="[INFO] Black hole port fault is not running" netns="host" command="iptables -C egress-tcp-1234 -p tcp --dport 1234 -j DROP" output="iptables: Bad rule (does a matching rule exist in that chain?).\n" exitCode=1
level=info time=2024-09-20T01:01:33Z msg="[INFO] Attempting to start network black hole port fault" netns="host" chain="egress-tcp-1234"
level=info time=2024-09-20T01:01:33Z msg="Successfully started fault" request="{\"Port\":1234,\"Protocol\":\"tcp\",\"TrafficType\":\"egress\"}" response="{\"Status\":\"running\"}" requestType="start network-blackhole-port"
level=debug time=2024-09-20T01:01:38Z msg="Handling http request" method="HEAD" from="127.0.0.1:59468"
level=debug time=2024-09-20T01:01:43Z msg="Handling http request" method="GET" from="172.31.25.237:54894"
level=info time=2024-09-20T01:01:43Z msg="Received new request for request type: check status network-blackhole-port" request="{\"Protocol\":\"tcp\",\"TrafficType\":\"egress\",\"Port\":1234}" requestType="check status network-blackhole-port" tmdsEndpointContainerID="64ca83af-f51b-4e66-acff-d0e3c29c1afc"
level=debug time=2024-09-20T01:01:43Z msg="Successfully parsed fault request payload" request="{\"Port\":1234,\"Protocol\":\"tcp\",\"TrafficType\":\"egress\"}"
level=info time=2024-09-20T01:01:43Z msg="[INFO] Black hole port fault has been found running" netns="host" command="iptables -C egress-tcp-1234 -p tcp --dport 1234 -j DROP" output=""
level=info time=2024-09-20T01:01:43Z msg="[INFO] Successfully checked status for fault" requestType="check status network-blackhole-port" request="{\"Port\":1234,\"Protocol\":\"tcp\",\"TrafficType\":\"egress\"}" response="{\"Status\":\"running\"}"
level=debug time=2024-09-20T01:01:53Z msg="Handling http request" method="DELETE" from="172.31.25.237:52062"
level=info time=2024-09-20T01:01:53Z msg="Received new request for request type: stop network-blackhole-port" tmdsEndpointContainerID="64ca83af-f51b-4e66-acff-d0e3c29c1afc" request="{\"Protocol\":\"tcp\",\"TrafficType\":\"egress\",\"Port\":1234}" requestType="stop network-blackhole-port"
level=debug time=2024-09-20T01:01:53Z msg="Successfully parsed fault request payload" request="{\"Port\":1234,\"Protocol\":\"tcp\",\"TrafficType\":\"egress\"}"
level=info time=2024-09-20T01:01:53Z msg="[INFO] Black hole port fault has been found running" netns="host" command="iptables -C egress-tcp-1234 -p tcp --dport 1234 -j DROP" output=""

Corresponding iptables on host network namespace

[ec2-user@ip-172-31-25-237 amazon-ecs-agent]$ sudo iptables -nL
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
DROP       tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:51678
DROP       all  -- !127.0.0.0/8          127.0.0.0/8          ! ctstate RELATED,ESTABLISHED,DNAT

Chain FORWARD (policy DROP)
target     prot opt source               destination         
DOCKER-USER  all  --  0.0.0.0/0            0.0.0.0/0           
DOCKER-ISOLATION-STAGE-1  all  --  0.0.0.0/0            0.0.0.0/0           
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
DOCKER     all  --  0.0.0.0/0            0.0.0.0/0           
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         

Chain DOCKER (1 references)
target     prot opt source               destination         

Chain DOCKER-ISOLATION-STAGE-1 (1 references)
target     prot opt source               destination         
DOCKER-ISOLATION-STAGE-2  all  --  0.0.0.0/0            0.0.0.0/0           
RETURN     all  --  0.0.0.0/0            0.0.0.0/0           

Chain DOCKER-ISOLATION-STAGE-2 (1 references)
target     prot opt source               destination         
DROP       all  --  0.0.0.0/0            0.0.0.0/0           
RETURN     all  --  0.0.0.0/0            0.0.0.0/0           

Chain DOCKER-USER (1 references)
target     prot opt source               destination         
RETURN     all  --  0.0.0.0/0            0.0.0.0/0           
[ec2-user@ip-172-31-25-237 amazon-ecs-agent]$ sudo iptables -nL
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
DROP       tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:51678
DROP       all  -- !127.0.0.0/8          127.0.0.0/8          ! ctstate RELATED,ESTABLISHED,DNAT

Chain FORWARD (policy DROP)
target     prot opt source               destination         
DOCKER-USER  all  --  0.0.0.0/0            0.0.0.0/0           
DOCKER-ISOLATION-STAGE-1  all  --  0.0.0.0/0            0.0.0.0/0           
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
DOCKER     all  --  0.0.0.0/0            0.0.0.0/0           
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
egress-tcp-1234  all  --  0.0.0.0/0            0.0.0.0/0           

Chain DOCKER (1 references)
target     prot opt source               destination         

Chain DOCKER-ISOLATION-STAGE-1 (1 references)
target     prot opt source               destination         
DOCKER-ISOLATION-STAGE-2  all  --  0.0.0.0/0            0.0.0.0/0           
RETURN     all  --  0.0.0.0/0            0.0.0.0/0           

Chain DOCKER-ISOLATION-STAGE-2 (1 references)
target     prot opt source               destination         
DROP       all  --  0.0.0.0/0            0.0.0.0/0           
RETURN     all  --  0.0.0.0/0            0.0.0.0/0           

Chain DOCKER-USER (1 references)
target     prot opt source               destination         
RETURN     all  --  0.0.0.0/0            0.0.0.0/0           

Chain egress-tcp-1234 (1 references)
target     prot opt source               destination         
DROP       tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:1234
[ec2-user@ip-172-31-25-237 amazon-ecs-agent]$ sudo iptables -nL
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
DROP       tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:51678
DROP       all  -- !127.0.0.0/8          127.0.0.0/8          ! ctstate RELATED,ESTABLISHED,DNAT

Chain FORWARD (policy DROP)
target     prot opt source               destination         
DOCKER-USER  all  --  0.0.0.0/0            0.0.0.0/0           
DOCKER-ISOLATION-STAGE-1  all  --  0.0.0.0/0            0.0.0.0/0           
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
DOCKER     all  --  0.0.0.0/0            0.0.0.0/0           
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         

Chain DOCKER (1 references)
target     prot opt source               destination         

Chain DOCKER-ISOLATION-STAGE-1 (1 references)
target     prot opt source               destination         
DOCKER-ISOLATION-STAGE-2  all  --  0.0.0.0/0            0.0.0.0/0           
RETURN     all  --  0.0.0.0/0            0.0.0.0/0           

Chain DOCKER-ISOLATION-STAGE-2 (1 references)
target     prot opt source               destination         
DROP       all  --  0.0.0.0/0            0.0.0.0/0           
RETURN     all  --  0.0.0.0/0            0.0.0.0/0           

Chain DOCKER-USER (1 references)
target     prot opt source               destination         
RETURN     all  --  0.0.0.0/0            0.0.0.0/0           

New tests cover the changes: yes

Description for the changelog

Feature: Adding start and stop network black hole port fault implementation

Additional Information

Does this PR include breaking model changes? If so, Have you added transformation functions?

**Does this PR include the addition of new environment variables in the README?**

Licensing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.