brefphp / bref

Serverless PHP on AWS Lambda
https://bref.sh
MIT License
3.15k stars 365 forks source link

Concurrency when running a Command + scheduling command #393

Closed tristanbes closed 5 years ago

tristanbes commented 5 years ago

Hi,

When running a Symfony Command, I would like to know how to handle the maximum parallelism of the command ?

The command is consuming messages from a IronMQ queue (maybe SQS soon); And I want to tell to use X lambda in parallelism;

Also, what's the best practice here: A/ 1 Symfony command = pulls 1 message from the queue do 1 HTTP call to api and then stop; B/ 1 Symfony command = pulls X messages from the queue, do X HTTP calls to api and then stop (within the 15min of the max lambda execution time);

Finally, is there a simple way to run the Command each X minutes without calling the bref cli ? (I can use Tower from Ansible to schedule it, but I wanted to know if there's something more official); I found this issue but it didn't help me much

Thanks :)

victormacko commented 5 years ago

The way I consume messages from SNS (which is roughly the same as consuming from SQS from a PHP point of view I think) is roughly following this guide; http://developer.happyr.com/symfony-messenger-on-aws-lambda

Essentially you need to setup a console-like command which pulls in data from Lambda, and feeds it to your services in Symfony. I've used the 'messenger' approach to 'potentially' handle different types of SNS functions, but in reality I suppose you could also just get away with sending your request straight to your service-class.

Via SNS the lambda function is invoked once for each message which comes in. Keep in mind that the 15 min thing is just for warm lambdas ... once it's done with one request, warm lamdas are re-used for other requests.. so unless you have upwards of 10k messages getting fed into lambda in a single second (as an example), you'll be ok .. I'm pretty sure SNS and/or SQS have retry functionality so you could get them to wait a few min before re-queueing the message to goto lambda. Via SQS I couldn't say with 100% confidence -- if it's only invoked once for each message, or just when messages are on the queue ... or if lambda sits there 'invoked' while there are no messages -- the docs on AWS weren't all that clear about it when I was reading it a few months back. (I'd also love to know if anyone's done it)

RE scheduling - the 'simple' way i've found is just to set it up as a cloudwatch event (config I have for the serverless.yaml is below) (i'm using Symfony, thus the 'bin/console');

console: name: ${self:service}-console handler: bin/console description: 'console function' timeout: 30 # in seconds (API Gateway has a timeout of 30 seconds) layers:

mnapoli commented 5 years ago

FYI https://github.com/brefphp/bref/issues/345 might help.

I would really recommend to let the queue invoke the lambda. Lambda isn't really meant to poll anything.

This is why SNS or SQS integrates well with Lambda: as soon as there's a new message, the lambda will be invoked to process it. That has the advantage of running only when necessary, not dealing with scaling/parallelism, etc.

Here is documentation on how to trigger a lambda via SQS: https://serverless.com/framework/docs/providers/aws/events/sqs/

If you want your worker to process only 1 message every time, set the batchSize to 1. That's simpler to handle failures (if your batchSize is 10, and the 5th event fails, then all events will be re-played).


Regarding the cron, you can do it with an event of type schedule:

service: app
provider:
    name: aws
    runtime: provided
plugins:
    - ./vendor/bref/bref
functions:
    website:
        handler: handler.php
        layers:
            - ${bref:layer.php-73}
        events:
            - schedule:
                rate: rate(5 minutes)

Be aware that this isn't a CLI command that will be executed but a "lambda" PHP function. Have a look at this example if it helps: https://github.com/mnapoli/externals/blob/master/serverless.yml#L42-L50

tristanbes commented 5 years ago

I would really recommend to let the queue invoke the lambda. Lambda isn't really meant to poll anything.

Good to know, i'll look into this;

that has the advantage of running only when necessary, not dealing with scaling/parallelism, etc.

Let's say we have 25,000 messages in SQS each 1st of the month to compute SEO rank on a particular keyword.(or SNS, not sure what to use here yet, it's still obscure, especially when @victormacko says that SNS calls 1 lambda per 1 message)

Then SQS will call the lambda to consume the queue, and each lambda will only treat 1 message (to avoid the replay problem your mentionned). And then the concurrency can be set with the reservedConcurrency settings, to avoid spamming our API with too many concurrent requests;

Thanks for the links provided, I really lacked example of lambda in a Symfony context (to have env variables, services, autowiring etc...) and how we can structure the code inside the lambda;

mnapoli commented 5 years ago

Yes that sounds like a good option to explore. I'm not 100% sure about whether SNS or SQS is best, I haven't much experience with SNS.

I know however that you can configure a "Dead Letter Queue" with SQS to collect messages that failed to process.

Also be aware that Lambda can re-invoke your lambda with the same message in case it fails (the number of retries can be configured). That can be useful (or not).

victormacko commented 5 years ago

Just a little more info on SQS if you're interested; https://dev.to/frosnerd/understanding-the-aws-lambda-sqs-integration-1981

mnapoli commented 5 years ago

Closing the issue but feel free to continue the discussion (I'm keeping the issue tracker up to date).