aws / aws-app-mesh-examples

AWS App Mesh is a service mesh that you can use with your microservices to manage service to service communication.
MIT No Attribution
864 stars 395 forks source link

[BUG] Internal SMTP connection fails with App Mesh #572

Closed dhilgarth closed 1 year ago

dhilgarth commented 1 year ago

Describe the bug Service A tries to send E-Mails via SMTP server running in Service B. Connection fails with Unexpected socket close Removing envoy and proxy configuration from the SMTP server task fixes the issue

Platform ECS Fargate

To Reproduce

Expected behavior Connection should succeed even with envoy sidecar

Config files, and API responses

Virtual Node App

``` meshName: dhtesting-local_mesh virtualNodeName: app-vn spec: backends: - virtualService: virtualServiceName: mailhog.dhtesting-local.svc.cluster.local listeners: - portMapping: port: 8000 protocol: http logging: accessLog: file: path: /dev/stdout serviceDiscovery: awsCloudMap: namespaceName: dhtesting-local.svc.cluster.local serviceName: app ```

Virtual Node Mailhog

``` meshName: dhtesting-local_mesh virtualNodeName: mailhog-vn spec: backends: [] listeners: - portMapping: port: 8025 protocol: http - portMapping: port: 1025 protocol: tcp logging: accessLog: file: path: /dev/stdout serviceDiscovery: awsCloudMap: namespaceName: dhtesting-local.svc.cluster.local serviceName: mailhog ```

Virtual Router Mailhog

``` meshName: dhtesting-local_mesh virtualRouterName: mailhog-vr spec: listeners: - portMapping: port: 8025 protocol: http - portMapping: port: 1025 protocol: tcp ```

Route Mailhog 1025

``` meshName: dhtesting-local_mesh routeName: mailhog_route_1025 virtualRouterName: mailhog-vr spec: tcpRoute: action: weightedTargets: - port: 1025 virtualNode: mailhog-vn weight: 1 match: port: 1025 ```

Route Mailhog 8025

``` meshName: dhtesting-local_mesh routeName: mailhog_route_8025 virtualRouterName: mailhog-vr spec: httpRoute: action: weightedTargets: - port: 8025 virtualNode: mailhog-vn weight: 1 match: port: 8025 prefix: / ```

Virtual Service Mailhog

``` meshName: dhtesting-local_mesh virtualServiceName: mailhog.dhtesting-local.svc.cluster.local spec: provider: virtualRouter: virtualRouterName: mailhog-vr ```

Task definition App

``` { "family": "dhtesting-local_app", "containerDefinitions": [ { "name": "app", "image": "...", "cpu": 0, "portMappings": [ { "name": "app-8000-tcp", "containerPort": 8000, "hostPort": 8000, "protocol": "tcp" } ], "essential": true, "environment": [ { "name": "MAIL_HOST", "value": "mailhog.dhtesting-local.svc.cluster.local" }, { "name": "MAIL_PORT", "value": "1025" } ], "dependsOn": [ { "containerName": "envoy", "condition": "HEALTHY" } ], "logConfiguration": { "logDriver": "awslogs", "options": { "awslogs-create-group": "true", "awslogs-group": "/dhtesting/local/services/app", "awslogs-region": "eu-central-1", "awslogs-stream-prefix": "_" } }, "healthCheck": { "command": [ "CMD-SHELL", "curl -f http://localhost:8000/ || exit 1" ], "interval": 30, "timeout": 5, "retries": 3 } }, { "name": "envoy", "image": "public.ecr.aws/appmesh/aws-appmesh-envoy:v1.25.4.0-prod", "cpu": 0, "portMappings": [], "essential": true, "environment": [ { "name": "APPMESH_RESOURCE_ARN", "value": "arn:aws:appmesh:eu-central-1:XXX:mesh/dhtesting-local_mesh/virtualNode/app-vn" }, { "name": "ENVOY_LOG_LEVEL", "value": "debug" } ], "mountPoints": [], "volumesFrom": [], "user": "1337", "logConfiguration": { "logDriver": "awslogs", "options": { "awslogs-create-group": "true", "awslogs-group": "/dhtesting/local/services/app", "awslogs-region": "eu-central-1", "awslogs-stream-prefix": "_" } }, "healthCheck": { "command": [ "CMD-SHELL", "curl -s http://localhost:9901/server_info | grep state | grep -q LIVE" ], "interval": 5, "timeout": 2, "retries": 3, "startPeriod": 10 } } ], "taskRoleArn": "arn:aws:iam::XXX:role/dhtesting-local_ecs-task-role", "executionRoleArn": "arn:aws:iam::XXX:role/dhtesting-local_ecs_execution_role", "networkMode": "awsvpc", "requiresCompatibilities": [ "FARGATE" ], "cpu": "2048", "memory": "4096", "proxyConfiguration": { "type": "APPMESH", "containerName": "envoy", "properties": [ { "name": "ProxyIngressPort", "value": "15000" }, { "name": "AppPorts", "value": "8000" }, { "name": "EgressIgnoredIPs", "value": "169.254.170.2,169.254.169.254" }, { "name": "IgnoredUID", "value": "1337" }, { "name": "ProxyEgressPort", "value": "15001" } ] } } ```

Task definition Mailhog

``` { "family": "dhtesting-local_mailhog", "containerDefinitions": [ { "name": "mailhog", "image": "mailhog/mailhog:latest", "cpu": 0, "portMappings": [ { "containerPort": 8025, "hostPort": 8025, "protocol": "tcp" }, { "containerPort": 1025, "hostPort": 1025, "protocol": "tcp" } ], "essential": true, "environment": [], "mountPoints": [], "volumesFrom": [], "dependsOn": [ { "containerName": "envoy", "condition": "HEALTHY" } ], "logConfiguration": { "logDriver": "awslogs", "options": { "awslogs-create-group": "true", "awslogs-group": "/dhtesting/local/services/mailhog", "awslogs-region": "eu-central-1", "awslogs-stream-prefix": "_" } }, "healthCheck": { "command": [ "CMD-SHELL", "wget -qO- http://localhost:8025/ > /dev/null 2>&1 || exit 1" ], "interval": 30, "timeout": 5, "retries": 3 } }, { "name": "envoy", "image": "public.ecr.aws/appmesh/aws-appmesh-envoy:v1.25.4.0-prod", "cpu": 0, "portMappings": [], "essential": true, "environment": [ { "name": "APPMESH_RESOURCE_ARN", "value": "arn:aws:appmesh:eu-central-1:XXX:mesh/dhtesting-local_mesh/virtualNode/mailhog-vn" }, { "name": "ENVOY_LOG_LEVEL", "value": "debug" } ], "mountPoints": [], "volumesFrom": [], "user": "1337", "logConfiguration": { "logDriver": "awslogs", "options": { "awslogs-create-group": "true", "awslogs-group": "/dhtesting/local/services/mailhog", "awslogs-region": "eu-central-1", "awslogs-stream-prefix": "_" } }, "healthCheck": { "command": [ "CMD-SHELL", "curl -s http://localhost:9901/server_info | grep state | grep -q LIVE" ], "interval": 5, "timeout": 2, "retries": 3, "startPeriod": 10 } } ], "taskRoleArn": "arn:aws:iam::XXX:role/dhtesting-local_ecs-task-role", "executionRoleArn": "arn:aws:iam::XXX:role/dhtesting-local_ecs_execution_role", "networkMode": "awsvpc", "requiresCompatibilities": [ "FARGATE" ], "cpu": "256", "memory": "512", "proxyConfiguration": { "type": "APPMESH", "containerName": "envoy", "properties": [ { "name": "ProxyIngressPort", "value": "15000" }, { "name": "AppPorts", "value": "8025,1025" }, { "name": "EgressIgnoredIPs", "value": "169.254.170.2,169.254.169.254" }, { "name": "IgnoredUID", "value": "1337" }, { "name": "ProxyEgressPort", "value": "15001" } ] } } ```

Log from app-envoy

``` [2023-07-01 09:35:58.512][55][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:211] [C197] new tcp proxy session [2023-07-01 09:35:58.512][55][trace][connection] [source/common/network/connection_impl.cc:362] [C197] readDisable: disable=true disable_count=0 state=0 buffer_length=0 [2023-07-01 09:35:58.512][55][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:383] [C197] Creating connection to cluster cds_egress_dhtesting-local_mesh_mailhog-vn_tcp_1025 [2023-07-01 09:35:58.512][55][trace][connection] [source/common/network/connection_impl.cc:423] [C197] raising connection event 2 [2023-07-01 09:35:58.512][55][trace][filter] [source/common/tcp_proxy/tcp_proxy.cc:670] [C197] on downstream event 2, has upstream = false [2023-07-01 09:35:58.512][55][debug][conn_handler] [source/extensions/listener_managers/listener_manager/active_tcp_listener.cc:147] [C197] new connection from 10.0.1.55:46644 [2023-07-01 09:35:58.512][55][trace][connection] [source/common/network/connection_impl.cc:568] [C197] socket event: 2 [2023-07-01 09:35:58.512][55][trace][connection] [source/common/network/connection_impl.cc:679] [C197] write ready [2023-07-01 09:35:58.513][55][trace][connection] [source/common/network/connection_impl.cc:362] [C197] readDisable: disable=false disable_count=1 state=0 buffer_length=0 [2023-07-01 09:35:58.513][55][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:748] [C197] TCP:onUpstreamEvent(), requestedServerName: [2023-07-01 09:35:58.513][55][trace][connection] [source/common/network/connection_impl.cc:568] [C197] socket event: 2 [2023-07-01 09:35:58.513][55][trace][connection] [source/common/network/connection_impl.cc:679] [C197] write ready [2023-07-01 09:36:13.518][55][trace][filter] [source/common/tcp_proxy/tcp_proxy.cc:697] [C197] upstream connection received 0 bytes, end_stream=true [2023-07-01 09:36:13.518][55][trace][connection] [source/common/network/connection_impl.cc:483] [C197] writing 0 bytes, end_stream true [2023-07-01 09:36:13.518][55][trace][connection] [source/common/network/connection_impl.cc:568] [C197] socket event: 2 [2023-07-01 09:36:13.518][55][trace][connection] [source/common/network/connection_impl.cc:679] [C197] write ready [2023-07-01 09:36:13.518][55][trace][connection] [source/common/network/connection_impl.cc:568] [C197] socket event: 2 [2023-07-01 09:36:13.518][55][trace][connection] [source/common/network/connection_impl.cc:679] [C197] write ready [2023-07-01 09:36:13.519][55][trace][connection] [source/common/network/connection_impl.cc:568] [C197] socket event: 3 [2023-07-01 09:36:13.519][55][trace][connection] [source/common/network/connection_impl.cc:679] [C197] write ready [2023-07-01 09:36:13.519][55][trace][connection] [source/common/network/connection_impl.cc:608] [C197] read ready. dispatch_buffered_data=0 [2023-07-01 09:36:13.519][55][trace][connection] [source/common/network/raw_buffer_socket.cc:24] [C197] read returns: 0 [2023-07-01 09:36:13.519][55][trace][filter] [source/common/tcp_proxy/tcp_proxy.cc:623] [C197] downstream connection received 0 bytes, end_stream=true [2023-07-01 09:36:13.519][55][debug][connection] [source/common/network/connection_impl.cc:656] [C197] remote close [2023-07-01 09:36:13.519][55][debug][connection] [source/common/network/connection_impl.cc:250] [C197] closing socket: 0 [2023-07-01 09:36:13.519][55][trace][connection] [source/common/network/connection_impl.cc:423] [C197] raising connection event 0 [2023-07-01 09:36:13.519][55][trace][filter] [source/common/tcp_proxy/tcp_proxy.cc:670] [C197] on downstream event 0, has upstream = true [2023-07-01 09:36:13.519][55][trace][conn_handler] [source/extensions/listener_managers/listener_manager/active_stream_listener_base.cc:111] [C197] connection on event 0 [2023-07-01 09:36:13.519][55][debug][conn_handler] [source/extensions/listener_managers/listener_manager/active_stream_listener_base.cc:120] [C197] adding to cleanup list ``` IP 10.0.1.55 is the IP address of the App Task

dhilgarth commented 1 year ago

Recreated in the proper repo: https://github.com/aws/aws-app-mesh-roadmap/issues/468