dwyl / learn-aws-lambda

✨ Learn how to use AWS Lambda to easily create infinitely scalable web services
1.05k stars 203 forks source link

Should each Lambda function be created in a new repo? #42

Open nikhilaravi opened 8 years ago

nikhilaravi commented 8 years ago

Any thoughts on how we should organise our lambda functions?

nelsonic commented 8 years ago

@nikhilaravi I would prefer to keep all lambda functions in the same repo so we only need to set-up one CI service. however it appears that the client prefer to split them out into many repos...

nikhilaravi commented 8 years ago

@nelsonic okay! I also think it would be nicer to have them in separate repos as they are self contained pieces of code.

nelsonic commented 8 years ago

Agree that the Lambda functions are self-contained, however there are certain advantages to a "MonoRepo": https://github.com/babel/babel/blob/master/doc/design/monorepo.md (I don't agree with all of them, but some are valid ... e.g a single .eslint)

Given that this decision has been made for us, we can work with it. :wink:

alex-nishikawa commented 8 years ago

What about when lambdas needs to share code? Wouldn't the "mono repo" make sense? Then again, if they are sharing code maybe they should be rolled into one mega lambda that decides the control flow at the beginning of it's execution.

nelsonic commented 8 years ago

@alex-nishikawa for a recent lambda-based project we've worked on all shared code was kept in a node module: https://github.com/numo-labs/aws-lambda-helper which meant that the lambda functions could be separate repos... Its a question of whether you think micro-services should be a single project or several separate ones ... ❓

jyenes commented 7 years ago

@nelsonic have you thought about using git submodules? So you can keep the lambda functions isolated in their own repos, and a parent repo that contains all of them. makes sense?

nelsonic commented 7 years ago

@jyenes yeah, that would be a good way of managing things. I see the benefit of having separate repos but I think the "consensus" in "Serverless" is to have a single repo with many services...

jleach commented 7 years ago

I think multiple repos and submodules organizes them well and tracks changes specific to each function however it seems like more Yack Shaving than a single repo. I'm going to go with a single one with a simple folder structure and see how that works. I'll just spend a few extra characters in commit messages so that the function that was changed is noted in the commit message.

I think many repositories makes sense on a project basis, but not at the lambda level. Could you imagine doing API where you kept routes in one repository and business logic in another? Or an MVC app where UI, Controllers, and Model are all in separate repositories? Its just a though experiment but it illustrates the the point of keeping a single "project" in the same repository. I think at most you might do web, API, and lambda in three repos but not per-function.

drac94 commented 6 years ago

Let's say we have 10 lambdas and you only changed one, how would you handle the CI/CD, would you update all the lambdas or how would you know which lambda was changed when merging to a repo so you can isolate the deployment of only that lambda?

jleach commented 6 years ago

I use CloudFormation for deployments. I believe it only deployed a change set. Perhaps you can parse the JSON output from it.

alex-nishikawa commented 6 years ago

@drac94 I think you have to deploy all, because how are you going to handle rollbacks? Do you want the added complexity of trying to figure out how to rollback individual lambdas, possibly from a few merges back?

drac94 commented 6 years ago

@alex-nishikawa that bring us to the main topic of this thread, if we have each lambda in a repo you can deploy, rollback and version (tag) the lambda in an isolated way, in the other hand, that approach could lead to end having a lot of very basic repos and repeating the jenkins (or whatever tool you use) configuration for each repo.

alex-nishikawa commented 6 years ago

@drac94 For awhile I had many lambdas, then decided to roll them into one. A single lambda that acts as a micro-service. Now that API-Gateway allows us to point all endpoints to a single lambda (easily using proxy+), this works nicely for me. Also, since most of my lambdas required shared code I now only have to include a single library in a single lambda.

prashah7 commented 6 years ago

the problem using monorepo is you are tightly bound to a single language. Lets say you have a feature to reserve a dining table broken down into multiple lambda functions ex:

  1. RestaurantsLambda - get list of restaurants etc -- Java
  2. BookingLambda - takes payment, confirms booking etc -- Java
  3. EmailLambda - sending confirmation emails. --- Python

What if i want to write 3(EmailLambda) in Python??

rmullins-convene commented 6 years ago

@drac94 I agree that I feel like each lambda should have its own repo. This way each lambda is isolated in its version, deploy, etc. For me, it doesn't make sense that if only 1 lambda was changed... that in CI/CD it would build every lambda, bump the version for every lambda, and - if applicable - apply the deployment strategy on every lambda.

I realize I can rollback a single lambda if the deployment/rollback criteria is defined in SAM (CW alarms, etc).

But what if my deploy is successful.. but down the road I need to rollback just one Lambda function for some reason that wasn't caught during the initial deploy.... would I rollback the entire API stack?

Really lost on best way to go about this (my CI/CD is in terms of CodePipeline btw too). My gut instinct is to have a unique repo per lambda function.

But how do I tie these lambda functions to a common shared API Gateway during the build process? I don't want to have manage an identical copy of an APIGW SAM template in each Lambda repo

nelsonic commented 6 years ago

@rmullins-convene the approach you describe (deploying all lambdas as one) is what we ended up going with in our client project because it was much easier to know which version(s) of each Lambda Function were running. When we attempted to deploy each Lambda function individually it was a chore to debug and only had minimal benefit in terms of "separation" or "isolation" of functions.

Sadly, deploying all Lambdas in one "batch" was a chore to "orchestrate" from a DevOps perspective when they were in separate repositories. It seems simple at the start when there are <5 lambdas ... but as the number grew the complexity accelerated.

Having many Lambda Functions in separate repos resulted in considerably more "boilerplate" code than business logic. Which was a headache to maintain and tedious for new people joining the team to understand.

Ultimately, after having our app in Production for a few months, and seeing slow (and more importantly, inconsistent) response times from API Gateway we decided against continuing to build our application using many Lambda Functions because it was becoming way more complex and slower than if we just built, tested and deployed a Single App using "MVC". We re-wrote our app (which had 41 function) in Phoenix in 2 weeks and saw an improvement in all measures:

The "use case" for AWS Lambda is still very strong for a single function that performs a specific task e.g: sending personalised emails, optimising uploaded images or log analysis scripts. Building an entire web/mobile application, where each function call incurs up to 400ms of latency, was giving the users of our app a poor (inconsistent) UX.

Serverless is a great idea in theory, it's just painful in practice when building a larger App.

For people interested in Application Architecture, I'd highly recommend reading "Goodbye Microservices" on the Segment.com blog: https://segment.com/blog/goodbye-microservices and "Serverless" by "Mike Roberts": https://martinfowler.com/articles/serverless.html

Obviously, if your application is relatively "small" and does not need "low latency" / consistent response times, then building it with several separate Lambda Functions in distinct repositories will be "fine".

rmullins-convene commented 6 years ago

@nelsonic Thanks for the response.

Do you use SAM templates for CI/CD?

The latency issue you mentioned can be totally avoided by using a postHook that 'warms up' the new Lambda service (while traffic gets shifted via linear or blue/green deployment methods).

I guess my issue with having a single repo (associated with the API), that includes all the relevant Lambdas... is that - in regards to CI/CD - let's say I updated just 1 lambda and have committed those changes --- the build would trigger but it would re-build every single one of my Lambdas, increment their versions, and perform deployment strategy on each Lambda (shifting traffic, etc) -- when really, this should only be needed for the 1 lambda that I updated.

Does my confusion here make sense? But please let me know if I'm missing something.

Thank you!

nelsonic commented 6 years ago

@rmullins-convene while having a postHook is a goo suggestion, sadly, latency in our App could not be avoided by keeping Lambdas "warm". 😞 Most of the latency is due to API Gateway and then "hopping" from one Lambda to another. Every request we had went through a minimum of 3 lambdas and round-trip response time was never less than 500ms in "ideal" ("warmed") conditions. 😢 🐌

We did not use SAM templates at the time; we ended up using Serverless. would you recommend SAM?

For anyone reading this later, "SAM" is: https://github.com/awslabs/serverless-application-model

I agree that you only need to update 1 Lambda (the lambda where code was changed). You aren't "missing" anything. However the question you could ask is: how do I track (in CloudWatch logs) which version of the Lambda executed? e.g: imagine the following scenario:

The issues are not specific to "batch" vs. single lambda deployment. But having a single version across all lambdas can make debugging easier when there are several lambdas being updated over the course of a "sprint" and you need to debug.

geovanisouza92 commented 6 years ago

My 2 cents about your questioning, @nelsonic: It can be resumed as Observability, and that's really challenging.

1) About tracing, on AWS there's XRay already available, but you can consider sending your metrics / traces to another service (Prometheus + Jaeger, e.g.) 2) About versioning, if you are running only published versions (as oposed to $LATEST), CloudWatch already logs that version, for each line, alongside the request ID and function instance ID. So, you could trace each request and each function instance (and detect how much requests each instance is responding during a traffic burst), along with the version. Besides that, you could format your logs including more information and tags, like database entity ID's, internal metrics, git tags, env vars, input events, directly on the logs (CloudWatch) and then process it and send it to another service, like ElasticSearch. That way you can set CloudWatch to expire logs shortly, reducing costs.

You can be even clever and setup automated actions based on metrics, like rolling back function versions when errors go high.

frankfarrell commented 5 years ago

I came across this thread randomly, but this is an issue I faced. I wrote a gradle plugin for only deploying modules that have changed in a single repo, may be useful for you (or at least the ideas in it): https://github.com/frankfarrell/blast-radius

Catastropha commented 5 years ago

How about if you keep all lambdas in one repo and have separate branches for each function and then have separate pipelines attached to each branch so it's triggered whenever a branch (a lambda function) is updated. Keep master only to merge changes for all functions but will not trigger deployments.

andrejromanov commented 4 years ago

Another one approach is configure your deployment scripts to check if hash of code of every lambda function in single repo is changed. In result, only changed functions will be deployed

sjortiz commented 4 years ago

I arrive here looking at how to approach a project structure but I'll go with multiple repos, or a combination of merging a couple of functions into a file in a repo and the rest into another repo.

Why?

simple because you can't use different triggers if all the functions are part of a file.

sachin-source commented 4 years ago

I have one Git repository and with 10 lambdas. And I want CI-CD pipeline for my repository (private). can anyone suggest the website to do this. (I have searched in many websites but they're using 1 repository for each lambda)

neilnmartin commented 2 years ago

It seems there are some assumptions in this thread about deployment pipelines concerning one repo vs. multiple?

If you have all of your lambdas in a single repo, you can still have separate folders and deployment configurations for each one (i.e. using something like Terraform). Depending on your CI/CD tool you could configure your deployment pipeline to accept an argument, for example a specific function name, and have the pipeline only deploy that. This depends on your CI/CD tool of choice but I have seen a seamless and easy implementation using Gitlab before.

madyanmalfi commented 1 year ago

@Catastropha

How about if you keep all lambdas in one repo and have separate branches for each function and then have separate pipelines attached to each branch so it's triggered whenever a branch (a lambda function) is updated. Keep master only to merge changes for all functions but will not trigger deployments.

This is exactly what I ended up doing :) .. creating multiple repos is a mess and keeping them in folder just will need to deploy all lambdas every time we update any of them.