Closed astewart-twist closed 8 years ago
Hi astewart-twist,
thanks for your comment!
Yes, you can eliminate the SQS component by running the ECS task and supplying the parameters for it through setting environment variables. Check out the "overrides" parameter of the runTask method here: http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/ECS.html#runTask-property
The reason I put the SQS queue here was to have a generic way of forwarding any type of event to the ECS task in an easy way.
So if your particular application uses event data that can be easily passed to an ECS task through setting environment variables, you should be good to go.
Hope this helps, Constantin
Thanks for the reply @glez-aws. Actually the more I think about it I think I'm coming around to the idea of maintaining a dedicated SQS queue for ECS tasks. Still though, I think ideally it should be a function of the ECS cluster scheduler. If all resources across my ECS cluster are occupied, any additional task requests will simply be rejected.
In my current setup I'm already passing inputs to my ECS tasks via overrides. Works pretty well. I have a lambda function receiving events and attempting to run ECS tasks. Again, though, if the cluster is at capacity the task request is declined. I've added a retry loop but obviously the Lambda function itself will time out after 5 minutes. Using SQS as a persistent task queue would certainly prevent tasks from dropping completely, but a pure ECS solution I think would be really nice. I imagine either the ECS cluster itself maintains a (cluster-level) queue, or maybe ECS instance agents can be enabled to receive task requests even if over capacity but the tasks remain in pending state until resources on that ECS instance are free.
From what I understand something like this currently isn't possible, correct?
Hi,
I think there are two components to keep in mind here:
In the Scenario above, Lambda essentially does both: It writes event data into an SQS queue (to help with 1) and kicks off an ECS task to drain the Queue, to address 2.
Since this is just an example, both ways to address job management and worker management can be improved. Traditionally, whole sofware companies could be dedicated to either one of these aspects.
Today, we can simply plug Cloud services together to accomplish what we need. Since SQS is easy to use and easy to integrate, we would recomment to just use it together with ECS instead of adding yet another task scheduling mechanism as part of ECS: That’s what the building block philosophy is about.
The missing piece in your setup is a way to automatically scale the number of running ECS tasks and a means to make sure there’s always at least one ECS task running that can drain your SQS queue.
ECS supports the notion of an ECS "service" that makes sure that a user-specified number of ECS tasks are running at all times. Have you considered implementing your ECS task as an ECS service?
You could then add automatic scaling by configuring a second Lambda function to be triggered by a CloudWatch alarm that watches the number of visible (and thus unprocesses) SQS messages so it would increase the number of tasks for processing messages during periods of high load, then decrease up to 1 when the queue is seeing less messages.
Hope this helps, Constantin
Hi,
here’s a link to the ECS documentation on Services: http://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs_services.html
Cheers, Constantin
Closing this issue since it's more of a discussion topic and I believe some answers were offered.
@glez-aws Thanks for the great info! Especially w.r.t. SQS+lambda+autoscaling. I've swapped out my earlier SNS-ping-pong approach for job request persistence and moved to a proper SQS queue approach.
I really dislike the SQS component here. Queueing up pending tasks that can't yet be assigned an ECS instance should be a natural function of the ECS schedular.
Any thoughts on how that might be done?