aws / amazon-ecs-init

Amazon Elastic Container Service RPM
http://aws.amazon.com/ecs
Apache License 2.0
200 stars 118 forks source link

ECS cluster registers new task every time While uploading large file in aws ecs through loadbalancer with 503 gateway error #373

Closed dh-shreeram closed 3 years ago

dh-shreeram commented 3 years ago

I have a ecs cluster running strapi there, uploading file size upto 50 mb is working fine but when i try to upload file size larger than that the container in the cluster restarts every time when i upload a file.

i also tested creating a snapshot of the ec2 instance and running the container manually but there is no any issue while uploading from public ip of ec2.

there is no error log on the container as well for the issue.

is it the issue of application loadbalancer? how can i fix the issue?

sharanyad commented 3 years ago

@dh-shreeram To debug this issue better, could you send ECS agent logs from the instance where you're facing this issue? Also the task ID which stops after starting up will be helpful. Please send them to ecs-agent-external (at) amazon.com

Thanks, Sharanya

dh-shreeram commented 3 years ago

Not able to ssh into the ecs instances as well i have a monggodb instance in the same network i can access that instance through ssh but not able to ssh into ecs instances in same private subnet through vpn ,ports are opened in the security group as well.

dh-shreeram commented 3 years ago

@sharanyad ssh login is not working in ecs container instances, also the port is open in security group. the container port is accessible using nc command but not the ssh. what might be the issue?

dh-shreeram commented 3 years ago

@sharanyad i have sent ECS agent logs from the instance where i was facing the issues with the older task and new task id to ecs-agent-external (at) amazon.com

dh-shreeram commented 3 years ago

@sharanyad any update ?

angelcar commented 3 years ago

Hi, Apologies for the delay on this issue.

After checking the logs, the container seems to be stopped due to lack of memory. Everything in the logs seems normal, until all of a sudden, docker reports at 2020-11-23T10:09:55 that the container was killed due to an OutOfMemoryError.

Strapi recommends to have at least 2GB RAM, with a 4GB recommendation. Your container seems to have only 256MB (i.e. 262144kB as per logs below). This seems to be the issue.

From the ACS Agent logs (edited for clarity):

level=info time=2020-11-23T10:09:55Z ... Container [gwell-editorial-content]: ... Reason OutOfMemoryError: Container killed due to memory usage ...

And from containerd logs:

Nov 23 10:09:55 ip-172-18-1-122 kernel: node invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null),  order=0, oom_score_adj=0
Nov 23 10:09:55 ip-172-18-1-122 kernel: node cpuset=4730a1fdbf673c946399e6be6448b351835dedde3855982fd83994ca9a5c1c2c mems_allowed=0
.
.
.
Nov 23 10:09:55 ip-172-18-1-122 kernel: Task in /ecs/050665e72fc24fc78369945095351fd8/4730a1fdbf673c946399e6be6448b351835dedde3855982fd83994ca9a5c1c2c killed as a result of limit of /ecs/050665e72fc24fc78369945095351fd8/4730a1fdbf673c946399e6be6448b351835dedde3855982fd83994ca9a5c1c2c
Nov 23 10:09:55 ip-172-18-1-122 kernel: memory: usage 262144kB, limit 262144kB, failcnt 129788
angelcar commented 3 years ago

Please feel free to re-open if needed.