Moving away from Firebase in favor of a more traditional API

WelcometoMyGarden / welcometomygarden

Web app of Welcome To My Garden, a not-for-profit network of citizens offering free camping spots in their gardens to slow travellers.

https://welcometomygarden.org

GNU Affero General Public License v3.0

107 stars 16 forks source link

Moving away from Firebase in favor of a more traditional API #106

Closed archived-m closed 3 years ago

archived-m commented 3 years ago

Hey everyone!

As a result of our recent community call, recent frustration with the deployment pipeline and limitations in both development and welcoming open source contributions, we've often discussed moving away from Firebase for this project. I'd like to have the remainder of this discussion in the open - both to welcome outside suggestions and so we can link back to this should we need to in the future.

I've listed some of the pros and cons to using Firebase below, compiled from my experience using it on this project and a handful of others. I also added a list of repercussions for us should we move, and things we should pay attention to if we were to create a more traditional API. Finally, I've included my suggestions for a new stack and the rationale behind it - but I welcome your suggestions as well.

Originally we decided to use it because we were on a clock and it lent itself well to the issues we were trying to solve. We had no idea what kind of attention the project would get, so we needed something that would scale, in absence of the skills and time required to set up a proper infrastructure. Many of these reasons have since changed, which sparked the discussion.

Note that I am very much in favour of this change, and while I tried to be objective, this list is coloured through my glasses of dread :). Google is a data hog and I don't trust them with our users', let alone mine. If I can avoid spending another dime of user donated money on helping them continue their business, I will.

Pros to Firebase

Relatively low cost when you account for the way Firebase bills (e.g. optimizing reads)
Authentication is easy to get right and hard to do wrong
Generous free tier (but negligible for our usage)
Real-time first (stuff like instant messaging is comparatively easy)
Diagnostics and cross-device performance analytics are very easy
Barely any infrastructure/DevOps work or know-how required
SDKs for all platforms (web, iOS, Android)
Easy dynamic links, great thing to have for PWA or Native apps
Scales automatically. We don't need to estimate usage or provision more compute power, ... It "just works"
Serverless is hip

Cons to Firebase

Testing and test environments are hard, a hassle to set up, and not every edge case can be tested. A lot of it is mocked.
Docs take getting used to and valuable information is scattered
Querying is limited. Complex queries (such as geo queries) result in very inefficient and frequent HTTP requests. Anything relational is also pretty yikes. Even simple queries such as running a count on a collection end up being pretty inefficient or hard to do.
An atypical way of doing development, which is daunting to potential contributors of all walks of life and different backgrounds, and harder to find "best practices" for
It's really discouraging as an open-source contributor, and a turn-off for me personally. A credit card is required before you can get started with your own instance
Huge vendor lock-in, at the mercy of Google for pricing changes. It is made purposefully hard to export or migrate your data (which also means automated backups are hard)
No event log, hard to do logging
Creating REST/GraphQL endpoints is tricky at best. This makes integrating with anyone else tricky too, at least if you don't build some secondary service to account for it
Data validation is too limited for nested or complex objects (such as chats that have messages)
Confusing billing
You structure data and document collections around their data validation rule capabilities and ecosystem, not best practices or performance
Limited to a single storage bucket for files (such as garden images or profile picture uploads in the future)
No real control over which server some of the data is on (iffy latency-wise)
Custom email support (especially for authentication) is lacking. We don't let users change their email because there currently is no way for us to not use their email change template. See the docs and #76
Cold start times on the cloud functions can take up to 4 seconds, you may notice this when creating a garden
The way reads are quoted and billed, makes it very easy to run into billing issues. You pay per read, but reading 1200 gardens = 1200 reads, not a single read as you'd have in a traditional client/server paradigm. You can get around this by creating aggregate cloud functions that will "reaggregate" a collection every time there's a write to a collection. In our case that would mean garden creation would take longer and would be subject to more frequent failure, but garden reads would now count as one (with more cloud function invocations as a tradeoff). I shouldn't have to think about any of this as a dev, in my opinion. See #54
It's generally more expensive once you scale past a certain point (and we haven't even started, we're going to the moon). We've received some concerns about this already, see #89

What moving looks like for us

These are purposefully ambiguous, there are pros and cons to both sticking with and moving away from Firebase:

Our chat is relatively low-code and low-effort, due to Firestore making it so easy. Developing this the traditional way has a big impact on our frontend code, will take quite a bit of dev time and rigorous testing (which we need more of anyway though).
Infrastructure would be very much DIY - deploys, scaling, load balancing, monitoring, and backups would need a considerable amount of special attention. Conversely, our monthly bill will be approximately a third of what it was.
Authentication is tricky to get right, and I'd very much like to get a lot of eyes on this should we implement our own, before we go to production. We need to redo quite a bit of the auth logic in the frontend though (also see #98)
A new backend will need its own suite of tests (unit, integration & e2e)
A "work package" on its own will be to write a set of scripts that can export all of our current data from firebase, run transformations on it to account for our new data model, and import it into a new database. We'll have to test these using staging data (1:1 with production) and take the platform down for 10-30 minutes when we make the switch, so no new data is added or mutated. This includes user accounts and using the same hash, and downloading + uploading all of the versions of every picture of every garden. We could also keep the bucket in place for now and make this change more gradually, as a phased approach.
The coupling between Firebase and our frontend is relatively tight, it's at store-level. While it could be worse, a move would incur substantial decoupling & rewrites of our request logic.

The later we choose to make this switch, the harder and more time-intensive it becomes. The facts won't change, and as it stands, neither will the number of contributors. As such, I'd prefer to do it ASAP or not at all.

The tech we choose may warrant a discussion on its own, and I know many have strong feelings and preferences on the direction to go in. Note that which technology either of us prefer or want has no bearing on whether or not we should switch. That's a separate issue. I will leave my suggestion and rationale below!

Thanks for reading, I look forward to seeing your proposals/thoughts :)

archived-m commented 3 years ago

As for my suggestion on what to replace it with:

I'd love to say something like Elixir + Phoenix or Absinthe, but the whole point of this discussion is to be more welcoming to new contributors :)

Using StackOverflow's annual developer survey of this year and past years, we see Typescript is the second most loved language (after Rust) and JavaScript is the most popular technology by a landslide. It also scores consistently well in the StateOfJS survey. Given that our frontend is JavaScript + soon to be TypeScript, and most of our core contributor team is familiar with it, it seems like an obvious choice as a successor.

More specifically, I'd like to use

Docker to make setting up our repository a one-command ordeal, as well as to simplify deployments (potentially using Kubernetes)
Nest.js to provide structure and excellent docs, so that there's always a dedicated place for a given piece of code. Alternatively, we give everything a place ourselves, but we'll have to discuss practices/structure down the line
Postgres as a database (which will even handle some of our legacy NoSQL)
Prisma 2 or TypeORM as an ORM. This mostly has an impact on things like data modelling, pagination, migrations, relations and transactions
GraphQL as the main paradigm, over REST (but we can do either or both using Nest), but this is very much up for discussion and unfortunately a point of heated debate among many. Using TypeScript on both sides of the stack enables a lot of coolness in combination with GraphQL that I'm very excited about
Redis, as a pub/sub mechanism (in combo with GraphQL subscriptions) for our chat and a way for us to cache Postgres queries on the server

Edit: All of these technologies are open source

Be sure to let me know your thoughts and/or objections before we make a final decision on this. Thanks!

chaixdev commented 3 years ago

hello, I'm a very new face here, take whatever you want from these comments below ;)

I have no experience with Firebase, but I read a lot of pain in between the lines of your post. I believe developer experience is worth investing in for any project, doubly so for community-driven projects.
Your comments regarding ease of use with automatic scaling and low DevOps requirements with firebase should not be dismissed. Any technology can be replaced with an alternative, but there's a cost. whether time or money or both. Are the right skills and enough (wo)manpower available?
a vote in favour of Kubernetes, if the decision is taken to move away from firebase.
Authentication: There is no need to reinvent the wheel here. As you say, authentication is tricky to get right and absolutely critical for both functional and security reasons. I have wondered why WMTG doesn't have a 'Login with X/Y/Z' option. The OAuth2.0 standard in combination with OpenID Connect offers a great way of integrating 3d party authentication while keeping backend services stateless and thus scaleable. (for the interested: short summary, more details ) There are other options for authentication servers that take care of all subtleties if there are concerns about forcing users to connect with Google or Facebook. I would advocate for a self-hosted Open Source solution: Keycloak )

archived-m commented 3 years ago

@chaixdev thanks for weighing in!

I agree on all your points. As a note on authentication, we initially rolled our own email/password auth because we didn't want to force anyone into having a third party account, as you mentioned. I don't think we're at all opposed to having them available as alternative methods of authentication, it's just that when we launched, we never took the time to do both well. While we can roll out third-party auth, we still need to facilitate email/password auth for existing users.

If we're going with the proposed stack, I was planning to use Passport which is a go-to for a lot of Node APIs, is the recommended way to do it using Nest, and has most auth strategies available for you, baked in as separately installable libraries. The tricky bit is that you want your auth server separated from your resource server so you can use and secure refresh tokens, and it becomes pretty microservicey (likely not a problem if we go the docker approach though). If you have experience with Keycloak, I'd love to have a chat.

mariha commented 3 years ago

Hey all,

Very well thought @MichielLeyman! Thanks for an in depth explanations.

I fully support the idea to move away from Firebase.

First of all, I should be able to dedicate a few hours per week for WTMG, probably not more than 10h though. Below I share some of my comments.

If I happened to do it by myself, I'd take Trustroots codebase (fork the repo) and adapt TR backend to WTMG frontend (keep graphical design for sure, not sure about frontend) and then migrate data and evolve from there.

Some of the arguments to it are:

TR has authentication, messaging, map search and profiles implemented and well tested over 5-6 years of usage. If I am not wrong, the rest what we have right now are static pages. TR code has tests, is also well modularized, removing features that we don't need/want wouldn't be that hard (at least this is my impression after quickly looking into the codebase).
we could spent more time on differentiating features, dedicated for slow-travellers or help TR team migrate from Angular.js to React.js or whatever their needs are
this will also give us a possibility to cooperate with TR team and ideally build together first ever federation of hospex platforms (using ActivityPub protocol) and solve issues WS and CS communities recently faced. More on this [here - TODO]().

Otherwise, some of thoughts on where to go:

Hosting/deployment

If it made the move easier, Google has Cloud Functions which could be used as a step between Firebase and containerized app. Otherwise there are a few options where to deploy Docker images and how much orchestration/control over infrastructure we'd like to have. I wouldn't go that deep into infrastructure to use Kubernetes until we need it.

Interesting options (for me) would be:

DigitalOcean: rather App Platform over Droplets, for ease of management
GCP: App Engine or Cloud Run (comparison of Google Serverless offerings), no opinion on which one would be better

Both have K8s should we need it at some point.

Architecture

I'd like to advocate for microservices.

Pros The main advantage is that each microservice runs in separate process so they can use it's own tech stack, allowing greater diversity of technologies. It would:

open doors to wider range of contributors with other skillsets and interests, like me and @Chai with experiance in jvm ecosystem.
allow to use best tool for the task, messaging and gardens/profiles could be backed by different database engines without increasing complexity of the features
Kotlin is native language for Android, being open to devs skilled in that area (jvm) may be useful when we decide to have native mobile apps
In the very long run, scalability

Cons are that the app is not that complicated to require it and it would mean more devops tasks for us.

Tech stack

I have no experience with JS but was planning to learn it, especially frontend. TypeScripts is great for me. Backend in JS would be out of my focus as for now, as I usually can find my ways with jvm ecosystem, what you propose seems very interesting though.

For database(s) I'd advocate for NoSQL for scalability and ease of use (no need for ORM).

MongoDB has some geospecial indexing, I haven't checked what exactly they are.
PostgreSQL implements R-Trees for spatial queries, if we chose relational databases.

What I can help with?

containerization (Docker)
building CI/CD pipeline
databases: I have much more experience with relational databases, but should be able to help with both
endpoints/API design: GraphQL would involve more learning from me, was planning to do it at some point anyways
frontend: no experience with web apps but this is what I would like to learn so would be happy to take some easy tasks
backend: writing services for jvm (Java or Kotlin)

archived-m commented 3 years ago

Appreciate the feedback! Fully in favour of Docker + K8s (even in the short term imo). I've had fantastic experiences with DigitalOcean and we are using both Google Cloud Run and Cloud functions to host WTMG in its current state already, they're all options.

Same for microservices, fully on board. As it stands, I think we'd only need to separate 3-4 services but I'd rather start with that than do it later. As long as we see the back-end/API as being decoupled from our front-ends, a possible native app in Kotlin in the future should be no problem regardless of which architecture we go with, afaik.

As you mentioned, with the microservice approach there's no problem using whichever tool is best for a given feature (e.g. Mongo for geo and Postgres for auth). I will say I'd love for most services to be built using the same paradigms, again to allow for ease of contributions. Not saying they can't diverge, but I'd rather have 3 in TypeScript using NestJS (or 3 using JVM) than one in TypeScript, one in Java and one in Elixir.

Huge fan of the idea to federate and I would love for us to get in touch with Trustroots. I do think this is a separate discussion from using their code as a starting point. As they state on their development page:

Trustroots was built upon MEAN.js boilerplate (from Mongo-Express-Angular-NodeJS). MEAN isn’t active anymore and we’ve modified the codebase extensively for our own purposes, so it’s better not to rely too much on their documentation. While boilerplate was a great way to get started with rather large application, we inherited a lot of cruft and kinda complicated setup from it. As time has passed, several aspects of the application are not that modern anymore and we have lots to do to bring it up to date.

I love reuse, but I don't want to inherit the second-hand cruft they inherited from MEAN. Looks like they have moved most of their codebase to React in the meantime though.

wardbeyens commented 3 years ago

Hello,

Wow, well, I am also in favour of getting rid of firebase and in creating our own backend with new yet approachable technologies.

The technologies listed by Michiel seem very interesting to me. Especially Nest.js and GraphQL I am curious about.

WTMG also serves to learn, in that regard, I am looking forward to trying out (these) newer technologies.

I am especially looking forward to being able to write code with confidence so that nothing else breaks, both in the frontend and backend.

ludov04 commented 3 years ago

Hey all!

Thanks @MichielLeyman for the very clear write up. I also support the idea of moving away from firebase and agree with most of your points. Firebase is great to get started but it very quickly locks you up. For me the biggest issue I see is vendor lock-in, harder to do local development.

Tech Stack

As of how to move forward I'd say Node.js + Typescript is awesome if welcoming contributors is important to us. I'm a huge fan of statically typed languages in that regard as:

It is usually easier to read the code as you know exactly what types which function returns
Easier to navigate through the code and make change with confidence as you know the transpiler has your back
Types forces you to think through your abstractions and you generally end up with code that is more testable.

Regarding the backend framework itself I don't have a strong opinion.

Infrastructure / Hosting

I'd also advocate to containerize things as much as we can rather than going for another FaaS solution like Lambdas or Cloud Function. When we have containerised apps that respect the 12-factor principles, then it's very easy to switch between providers or infrastructure layers.

I wouldn't go for a fully fledge K8S cluster at first. Even the managed solutions like EKS, GKE, AKS, they can be quite some work to properly manage.
Rather I would go with solutions like EKS Fargate or Google Cloud Run for Anthos that is built on Knative. IMHO these solutions provide a good way to get started quickly without spending a lot of time of configuration, setting up things like monitoring, right sizing the resources, ... and it's easy to then move from this to a proper k8s cluster when we grow and we feel like we need it.

Microservices

I'm in favour but we have to be careful and think about the why. Microservices comes with a lot of promises and it is great in principle, but it can come with its own set of challenges and it's not always worth it. As a example, here is a article about how Istio decided to move away from microservices

If you think about why companies do microservices, it is often to:

Increase velocity, supposedly as there are less dependencies between teams
Better resource management as you can scale one part of the system without scaling everything
Resiliency to failure
Less complexity (this is only true at a local level, your microservices in itself might be simple, but the whole system can end up being more complex) However, it comes with some drawback aswell:
More operational complexity: now you have to think about logging, authentication, boilerplate code, deploy pipeline, test suite for each of your services
More difficult to run the whole system locally
You have to think about how your services talk to eachother (over the internet or private network? How do they authenticate to eachother? etc..)

I reckon there is probably good candidate for microservices in our system, things that come in mind, even tho I'm not familiar with all the features/codebase:

Authentification
Chat
Search
...

On board with microservices, we just have to be mindful that with every service comes some operational overhead and think tings through when we create a new one or decide to split things up. 3-4 seems reasonable for our size tho.

Database

Slight preference over Postgres vs MongoDB, I think having a data model enforced by the relational database engine makes it easier to avoid data integrity issues down the road. Relational database can handle huge amount of traffic without problem these days and there are a lot of scaling strategy we can use.

archived-m commented 3 years ago

Thanks everyone! Most people with the time and energy to contribute in the near future have offered their opinions, and it looks like we can consider the decision to move away from Firebase finished. Wonderful :)

As for what we're moving to, still undecided. From what I've gathered on here and through other channels:

Typescript: :thumbsup:, no immediate preferences in terms of libraries/frameworks. Let me try boilerplate for a few top contenders and report back
Microservices: :thumbsup:, but not overdone. For now I see auth, users, chat and geo/campsites as four main layers of separation.
REST/GraphQL: No big opinions here so far - we can use them in combination if ever necessary (such as to expose endpoints publicly). Given limited interactions with third parties and a large part of request logic already being based in our front-end, I'd go for GraphQL here. Inherent benefits being
- development speed
- performance - both in speed and request/response size
- type checking
- it lending itself well to microservices
- largely self-documenting
- code generation (developer tooling in general really)
Cons to GraphQL to keep in mind for our use case:
- its comparative learning curve, partly because it has smaller representation in helpful resources
- the fact that we need to pay more attention to caching in the front-end (no HTTP caching, everything is POST)
- we'll have to keep an eye on n+1 queries
Database: depends on the service, but Postgres seems like the only one needed for now (chat is to be further investigated)
Infrastructure & Hosting:
- Containerization: :thumbsup:, but no full-fledged k8s yet. @ludov04 's suggestion of Fargate looks promising, but you all seem to be more DevOps savvy than me, so let's agree to Dockerize everything and we'll see where to host/how to host later down the road.
- We can also agree CI/CD is a must, given that I am the deploy bottleneck now and we really need some automated tests.
- Side note - I'd also like to move management of our OpenMapTiles server to this automation. It takes practically no management, but it is a black box on DigitalOcean for now. We can consider it a separate issue though, not scoped to this.

Fun! I think these are the base building blocks, and if we can agree on these, we'll figure out more specific tooling and such down the road, they are also more interchangeable in case we find we want to do something else.

I'll create an architecture diagram and set up some very basic different boilerplates to see which are easiest/best/most fun, and we'll go from there?

Go team!

mariha commented 3 years ago

So now I'm gonna take a step back... 😎

I will say I'd love for most services to be built using the same paradigms, again to allow for ease of contributions. Not saying they can't diverge, but I'd rather have 3 in TypeScript using NestJS (or 3 using JVM) than one in TypeScript, one in Java and one in Elixir.

Not sure if I agree... I think there is more reason, in our case, to use separate services because of the flexibility of tech stack it allows - which comes with opportunities to contribute for people from broader range of backgrounds - then because of the scalability it gives. We just don't need it yet and as others said, it comes with added complexity. So if not for that flexibility, I'd rather start with simple, single web service. We will try to keep it well modularized and keep in mind that at some point we may want to divide it into separate services and distribute over many execution instances. I wouldn't do the move until we are mature enough though... And once we have all the safety net tools and practices in place (good tests, CI/CD, monitoring, ...) we could go distributed. And then, there is no reason why we had to stick to single tech stack for all services 😉 Is there?

Hope it makes sense. Do you agree?

The good thing is that nestjs seems to make it really easy to move from single web app to a few microservices. And reading some code snippets, it doesn't look that different from what I'm used to so it may be easier for me to contribute then I initially thought.

Go team! 😉

chaixdev commented 3 years ago

Great to see all the contributions.

containers

Containers, yup, cloud functions not so much. One of the reasons stated to roll our own backend is to avoid vendor lock-in (and deservedly so) with containers, you can pick up and move relatively easily, it seems to me that with cloud functions you're once again stuck with APIs specific to the cloud functions platform provider. I had not yet heard of Knative, looks interesting.

dev <-> ops?

In my view, microservices make developing easier, and operations harder. From the perspective of engaging more contributors, this could actually be a big advantage: keeping developing (and code reviews for merge/pull requests) lighter, while keeping operations in the hands of key contributors.

microservices

I'd like to point out that WTMG is already partly 'micro serviced' in that the tile server is separately hosted. Likewise, I think authentication is a prime target. moving the user authentication and session stuff out of the core logic would already allow scaling the backend service horizontally, even before separating other parts that @MichielLeyman identified. Then, as the need arises, and we find that certain parts would benefit from being scaled separately, we can work on extracting them.

tech stack

I agree with the preference for a Typed language. I've used Typescript before (in Angular, back when I still pretended to be a full-stack dev :grin: ) and look back fondly (no .js for me brrrr :cold_face:)
I do disagree with @mariha on separate tech stacks though. in my opinion, this would create barriers for contributors that are already active in one area to also contribute in another. I guess what we're weighing is: potentially attracting diverse contributors vs facilitating contributors that are already active and committed. Maybe it comes down to who would be the maintainer of the specific microservice, or perhaps we need a discussion every time we think we want a new microservice for a certain goal.

Certainly, the right tool for the right job applies. If there's already an open-source product that covers our needs, I am much in favour (like Prometheus for monitoring or authentication with Keycloak).

mariha commented 3 years ago

Don't want to seem very picky, but I tend to have opinions and usually express them. I hope you'll get used to it ;)

In my view, microservices make developing easier, and operations harder.

true. Here is a list of things to keep in mind: "You need to be this tall to use [micro] services".

dev <-> ops?

Please, no. Let's do developing as light as possible while keeping the feedback loop closed and have everyone involved in the feature they develop from the beginning to the end and over again. It's beneficial for everyone: devs who can learn on their mistakes and are motivated to prevent them, and ops who would otherwise be downstream devs, dealing with issues someone else made. Users who are going to see fewer issues, hopefully 🤞.

moving the user authentication and session stuff out of the core logic would already allow scaling the backend service horizontally

Another option is to keeping session on the client side and use JWT for authentication.

Then, as the need arises, and we find that certain parts would benefit from being scaled separately, we can work on extracting them.

agree, as the need arises ;) I'd be exited to move it to the next level then.

I do disagree with @mariha on separate tech stacks though. in my opinion, this would create barriers for contributors that are already active in one area to also contribute in another. I guess what we're weighing is: potentially attracting diverse contributors vs facilitating contributors that are already active and committed.

I guess I spent too much time at huuuuge codebases where everything was meant to be uniform and as a result making any change was a pain - one would have to do it in whole codebase (600K LOC) so no one did it at all, for years. Let's write the code so that everyone can easily understand it. So I'd optimize for understandability and expressiveness, whatever technology/library it takes. I don't think we have to agree on it (technical uniformity over diversity) right now though, until we get to the point where we want to separate something.

Let's continue discussing in Slack, if you'd like, it's easier than here.

Everything else we are on the same page 😉

auloin commented 3 years ago

Hi! The discussion is well advanced and you've already covered all major points. Here are my thoughts:

The move

I fully support the move. It hasn't always been so, simply because building a backend takes time (glad to see that all participants are ready to invest some time to make it happen 😄).

To me, Firebase feels just too well engineered to charge users as much as possible.

To have a minimum of security or privacy, with Firestore, the same document needs to be split up.
They built a nice console which charges read/write operations without a warning...

On the security side, it's to be noted that there is no rate limiting feature, so a single user can ruin it for everyone.

Because we can't compromise on security, those trying to optimize reads just end up building their API... I think at some point we'll be facing the same issues.

I'd love to see in parallel, the start of a developer guideline. We have the opportunity to document and properly test the whole thing.

The stack

I'd like to see what Elixir is about, or even Kotlin. But for reasons already mentioned, the stack Michiel's proposing will do a better job, imo.

If I have to change something, it would be Typescript for vanilla JS 😸.

Joke aside, I'm fine with anything that can make the job easier.

suancarloj commented 3 years ago

Hey, I thought I would join in the conversation as suggested by @MichielLeyman on slack.

The first thing that I would like to propose is to use something like Terraform from the start, we recently put that in place at my current job and it's really great! It really helps documenting the infrastructure and adding new things is relatively easy, you can even write it in Typescript 😄

When it comes to k8s I would advice to not use it until we cannot do without, unless there is at least 3-4 ops people that can take good care of it. After using it for the past 4 years, I feel that it adds too much complexity. K8s can take a lot of time to debug if something goes wrong with the clusters if you don't know enough about it.

For Docker builds I would not mind exploring something like kaniko or buildah I tend to find docker build slow (at least on gitlab)

When it comes to micro-services I have to agree with @mariha, start simple with a single web service, specially with nestjs, as you can build good modules that you can later be moved as a new microservice. In the past year, I have read mostly about teams going back to fat services. A micro-services with bad domain boundaries will do more harm than good, and we could find ourselves with a distributed monolith 😄 . If the choice is to go the micro-service way, it would be good to make sure to pick a good way to enforce the API contracts, where I work we currently use GRPC with protobufs, this allows us to make sure that we are well aware of all breaking changes on our APIs, it also allows us to generate type for the frontends. Protobuf have a lot of drawback and can bring a lot of frustrations.

Regarding Nestjs, I think it's a good choice, as it's quite flexible, I would just suggest to keep the default setup with expressjs platform rather than trying the fastify which is technically faster, but with little documentation. Nest supports the OpenApi Spec which is nice to keep the apis well documented.

Concerning Postgres vs mongodb, I have a preference for postgres as you can have constraints in the db, and it's much easier to do analytics with SQL than with mongodb query pipelines. If mongodb is preferred, then I will just say do not use mongoose or typegoose as ODM for mongo, they are really bad for performance, and the semver is not well respected by mongoose.

In the case of Graphql vs REST, I have no preference, but I find REST to be much more accessible and there is much content available that we cannot encounter any surprise.

Looking forward to start working with all of you :)

archived-m commented 3 years ago

Thanks so much for the input everyone!

Here's the key decisions (I think) we've made, and I think we can consider final, allowing us to get started:

We've reached consensus on starting out with TypeScript. Let's use NestJS here
Postgres as a database where there is no clear advantage for something else in future features
No objections against GraphQL, but given the limited knowledge of it among the people who have volunteered to help, let's begin with a simple REST API. Svelte-kit (#111) will integrate better with it for the time being as well.
Talking to some of you individually, going by this thread, and by outside discussions, we can agree to take a cautionary approach to microservices. I propose we start with an auth service (custom NestJS service or Keycloak) and have a resource server "monolith" next to that, that stands for all remaining functionality. We measure and monitor, and separate/adjust when necessary.

Migrating

The following is a list of my interpreted hard requirements before we can export production data and make the actual move. It serves as a progress checklist to being "done" with this. We're aiming for migration without regression, and everything that adds is considered out of scope for this issue.

I'll leave this list as-is for another 2 days to welcome final input, changes or concerns, and then I will separate it into individual issues, where I will ask for help and on which you can express interest/commitment should you wish to contribute :heart:. I will also expand on each point, and we can discuss their intricacies there, separately. I've created a separate milestone (called v2 - Community) that we can use to track progress, and I will close this issue when the individual issues have been created.

[ ] Keycloak or NestJS auth service
[ ] Front-end refactor of auth-related code, removing tight coupling with Firebase
[ ] Chat using sockets - research, testing, and cognisance of future federation possibilities. Keep in mind requested improvements and allow for them to exist in the future (#41 and linked issues). Also, making headroom for platform federation in the future is very much to be kept in mind (#108)
[ ] Add campsite endpoints (should be simple and CRUD-y)
[ ] Add user modification endpoints (should be simple and CRUD-y)
[ ] Add better type support on the front-end, especially given the new request/response formats. It'd be amazing if #110 and #111 could be taken up in one fel swoop (I volunteer as tribute)
[ ] Replace all other interaction with Firebase (less tightly coupled) with interaction with our new API, paying special attention to caching
[ ] i18n on the back-end, for errors and emails. Related to #77 and #109
[ ] Unit tests
[ ] Functional API tests (request/response tests). Making a point to expand on this here, since this is the biggest "risk" of moving away for now. If we want longevity and security here, tests are non-negotiable. We should test our API for:
- Happy paths (status code, payload, idempotence, headers, response times/performance)
- Positive tests using modifiers (skip, limit, pages, sorts, ...), that match payload formats to the happy path
- Negative tests: all atypical requests and all erroneous status codes (mostly 400, 401, 403, 500 - everything except 2xx). Valid errors are sent and correctly formatted, are localized (i18n), and are reported within reasonable time (imagine a user getting very little UI feedback) Some examples of negative tests:
  - Response payload violates schema
  - Resource exists (e.g. uniqueness of email), on both create and update
  - Authorization testing (e.g. deleting an account that doesn't belong to you)
  - Invalid values in HTTP headers
  - Unsupported HTTP methods per endpoint
  - Missing request parameters
  - Invalid user IDs being supplied
  - Missing auth credentials/token
- Destructive tests:
  - Payload overflow (sending huge JSON bodies), violating database and schema validation constraints (e.g. name > 30 characters)
  - GET requests that take a long time to process or purposefully supplying an id for a resource that is thousands of characters long
  - Empty requests
  - Illegal characters and diacritics in request
  - Concurrency tests (PATCH/PUT and DELETE requests on the same resource at the same time)
  - Many other exploratory tests. Protect against the OWASP top 10 for starters
[ ] Rate limiting & throttling
[ ] CI/CD: I propose we merge features in to develop, which is deployed to staging, and make commits to master deploy to production. Other deploys for specific manual acceptance tests, can be deployed "on demand". Both branches will require PR approval, both run automated tests and style checks. There seems to be consensus on it being too early to start with k8s, so let's start by making everything docker-composable, and agree to measure first and improve later. Decisions to use stuff like Terraform, Dokku, Kaniko/Buildah and even our eventual hosting provider are way outside of my current expertise so I welcome this chance to learn from you, and invite you to discuss this in a separate issue
[ ] Monitoring and health checks with alerts to something like a Slack channel (Prometheus?)
[ ] Logging
[ ] Data map of current production data models to our new data models/schema, and a script that can take that current data and import it into our new prod database.
[ ] Documentation (#57):
- API docs (preferably mostly automated)
- Contributor docs & guidelines
[ ] I propose we tag future releases and introduce a changelog of what has changed, what was fixed, and what was added, including who contributed these changes

Deployment of the tile server is a separate, non-blocking issue and may not be an issue if we can partner with Mapbox.

I will take full responsibility for front-end related tasks together with @wardbeyens and @auloin, given its open PRs and general lack of contributor guidelines at the moment. If you wish to help, please respond to the individual issues for these tasks (see end of post).

Nice to have

Here are some open issues to keep in mind while developing replacement endpoints, that could be a quick fix, but are not required before migrating to this "new stack"

[x] #55 - Map Centering: allowing us to center the map based on the user's location so we can show them the map that is not naively centered on Belgium. Not using a third party here would be :ok_hand:
[x] #77 & #109 - Improving the translation process and making this a community-sourced effort: We will need i18n errors in this new API regardless
[ ] #54 Optimizing garden fetches: Initially this was to optimize performance and reduce billing using Firebase, but this is still really important in our own API. Geoquerying and efficient requests to show garden clusters based on map zoom level etc, are all things worth thinking about right now
[ ] #100 - Allowing multiple campsites per user: to be kept in mind while data modelling

I will kick us off with a starter skeleton and directory restructure in a draft pull request to develop. Go team!

ludov04 commented 3 years ago

Thanks @MichielLeyman for this awesome write up.

For CI/CD: I would argue that we should aim for using github flow rather than gitflow (https://lucamezzalira.com/2014/03/10/git-flow-vs-github-flow/). The difference is that there is no develop branch and features branch get merged into master and goes directly in production. However, pre-requisite for this kind of workflow involve a strong test and e2e test suite, which we don't have at the moment. So let's stick with gitflow for now and have the develop branch deploy to staging environment, and work with releases, once we feel confident we can ship a set of feature.
For hosting/platform, I suggest we start with Knative (https://cloud.google.com/knative). I think this is a good trade-off between ease of operation and flexibility. We can start with Knative on Cloud Run and move to k8s after that. Happy to help on that side with terraform

Knative makes it easy to start with Cloud Run and later move to Cloud Run for Anthos or start in your own Kubernetes cluster and migrate to Cloud Run in the future. By using Knative as the underlying platform, you can move your workloads freely across platforms, while significantly reducing the switching costs.

archived-m commented 3 years ago

The milestone (track progress here), v2 branch (all v2 related code and changes go here), and individual issues are ready.

For any additional input, please comment on the corresponding issues.

Thanks for responding everyone, it's been super helpful!