data-dot-all / dataall

A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.
https://data-dot-all.github.io/dataall/
Apache License 2.0
230 stars 81 forks source link

Migrate to AWS AppSync #958

Closed anmolsgandhi closed 4 weeks ago

anmolsgandhi commented 8 months ago

Description:

The proposal entails migrating data.all's GraphQL endpoint from the current implementation via API Gateway to AWS AppSync, a managed service specifically designed for serverless GraphQL. The current reliance on API Gateway, intended primarily for REST protocols, necessitates the implementation of custom logic deviating from standard GraphQL definitions. This departure poses challenges for GraphQL developers working within the data.all environment, as it diverges from conventional GraphQL practices. The migration to AppSync is to align with standard GraphQL conventions, providing a more seamless experience for developers and addressing troubleshooting difficulties associated with a custom wrapper. This change aims to enhance the compatibility and user-friendliness of data.all for GraphQL developers.

Details:

Benefits:

@dlpzx @petrkalos

petrkalos commented 6 months ago

For the migration to AppSync we are proposing a 2-phase approach to reduce risks and complexity of the changes.

Phase 1: Lift and Shift + Code First (Planned)

Phase 1 Description

Migrate to AppSync, leaving backend logic as-is. This phase represents a "lift and shift" strategy, transitioning from data.all's current API Gateway REST endpoints to AppSync GraphQL endpoints without significant changes to the backend logic. It involves limited refactoring of the backend/frontend code to make it compatible with AppSync endpoints and removing custom wrapper pieces.

Scope of Refactoring

Code Sample Example

There is a PoC branch that you can see some of the changes required

Phase 2 AppSync Optimization (Not Planned)

Phase 2 Description

Optimize Data.all Backend Code using AppSync Native Components. This phase aims at "rearchitecting and optimization" requiring heavy refactoring to leverage AppSync's native components for backend logic improvement, readability, risk minimization, and performance enhancement.

Scope of Optimization

Example

Benefits

Call Outs (Risks)

zsaltys commented 6 months ago

@noah-paige @dlpzx @petrkalos I'm not familiar with AppSync and something I need to learn more about but I'll take you for your word that this is a cleaner solution to implement graphql endpoints on AWS

My only concerns for this design are:

1) There's no mention of pricing in the design. How is AppSync priced and how can we calculate how much operation costs would increase after switching to AppSync?

2) A very important bit for us is that we do not use Cognito and integrate directly to OKTA without using Cognito as a proxy. The way this works today is with react-oidc and a custom authorizer on the API gateway. Design should mention how this pathway of authentication will continue to be supported.

3) You already called this out but local development support is critical. I would like to ask for more details in the design how the local development experience will change with AppSync integration.

4) Re-auth flow is very important. At the moment design only mentions "rework the ReAuth logic". We should make sure that we have a good idea if we will be able to make that work.

petrkalos commented 6 months ago

Hi @zsaltys and thanks for your comments

pricing

ApiGateway is 3.5$ per request AppSync is 4$ per request It makes AppSync ~15% more expensive, we will be able to close this gap (see points below) or even make it cheaper.

custom authorizer

AppSync supports OpenID custom authorizers out of the box, so I think there won't be any issue with OKTA. There is already a blogpost talking about this integration. I'd appreciate if you could take a look and let me know if it seems reasonable.

local dev support

I will dive deeper into it but on my first few iterations it seems that we won't be able to have a local backend like we currently do with Docker. In my testing I had to redirect my local frontend to the AWS backend and to minimize the time it takes to test changes I was just making updates to specific stacks or just the lambda.

ReAuth workflow

The reason we need to rework it is because the way it currently works is by returning HTTP specifics from the Lambda (302 etc). If we move to AppSync the Lambda won't know anything about HTTP, it returns only a dict with the requested data. I didn't write any implementation details wrt ReAuth because there are a few ways we can achieve it and we need to take a close look to find which one is the best.

dlpzx commented 2 months ago

@petrkalos @anmolsgandhi Should we close this issue for the time being? We can always re-open and come back to all the exploration that we did

petrkalos commented 4 weeks ago

For now we decided not proceed with AppSync for the following reasons...