department-of-veterans-affairs / va.gov-team

Public resources for building on and in support of VA.gov. Visit complete Knowledge Hub:
https://depo-platform-documentation.scrollhelp.site/index.html
281 stars 195 forks source link

Research how revproxy is used in RIs, dev and staging #55164

Closed JoeTice closed 1 year ago

JoeTice commented 1 year ago

Description

As we prepare to use revproxy in support of preview environments, there are a number of questions that need to be answered in order to ensure we approach the solution in the best way...

Kerry previously experimented with a Docker image of an nginx revproxy here.

Tasks

Acceptance Criteria

pjhill commented 1 year ago

Prior work with revproxy from Kerry is here -- https://vfs.atlassian.net/wiki/spaces/TT4/pages/2563407880/Revproxy+in+EKS

JoeTice commented 1 year ago

Sprint 23 Update - In Progress, Task was started this sprint but was too long to complete this sprint, will be completed in Sprint 24

JoeTice commented 1 year ago

Adding @gia-lexa @holdenhinkle to this task

gia-lexa commented 1 year ago

Adding my questions and notes raised from going through some of the documentation so far. These aren't in a particular order and more will be added.

What is RevProxy? How is it used currently in Review Instances?

According to https://depo-platform-documentation.scrollhelp.site/developer-docs/review-instances#Reviewinstances-Authentication Used for authentication purposes in Review Instances. A revproxy is a specifically-configured nginx server. This server intercepts the required SAML callback url (which comes from the SAML configuration, ID.me) and routes it to a specific Review Instance. If there are other uses for revproxy as they relate to review instances, I haven’t come across them yet…

From the text above, under Description:

As we prepare to use revproxy in support of preview environments, there are a number of questions that need to be answered in order to ensure we approach the solution in the best way…

Is this because with the introduction of EKS, we need to change the way Preview Environments interact with Docker images and therefore revproxy may also be affected?

What would I need to do to locally test my assumptions about revproxy and Docker? At least configure my local environment as explained here: https://depo-platform-documentation.scrollhelp.site/developer-docs/using-review-instances-to-preview-changes

Likely other steps involved beyond what’s listed in the link above, but that’s to be determined.

From https://vfs.atlassian.net/wiki/spaces/TT4/pages/2563407880/Revproxy+in+EKS Under: Things We Need to Discover "How do vets-website and content-build work together in non-local deployments (dev/staging/prod)?"

Why are these two repos and the discovery of how they work together critical for understanding revproxies?

TODOs based on Tasks listed in this ticket:

Is it possible to operate a single revproxy in support of preview environments? TODO: Investigate current Docker configurations to determine this.

"Can we use one image and configure it in the container after setup?" TODO: Determine what steps are taken currently If using one image and configuring it in the container after setup is a change from current configuration, implications of this new setup need to be documented

"Determine if we need to create a custom revproxy container for each preview environment, and if so, how should we do that?" TODO: Sketch out understanding of Docker and how revproxy works in it currently

"Can we borrow from the existing revproxy templates in devops?" TODO: Investigate existing revproxy templates in devops

gia-lexa commented 1 year ago

Some more questions that have come up as a result of digging into more documentation:

From RevProxy in EKS: Under Things We Need to Discover:

How do vets-website and content-build work together in non-local deployments (dev/staging/prod)?

Why? What specific details do we need to understand about this relationship and how will those details inform our workflows for revproxy?

Do we need a revproxy to route traffic from the public internet to our PEs? Would we have one revproxy per pod / pe?

Who would know this? Or, how could I prove this either way?

Under Things We’ve Tried:

Create revproxy locally and try to get it routing traffic to our local frontend

Why? What would this prove? What would the following steps test? What would confirm the test passes?

Under Things Left to Try:

Continue to troubleshoot routing with the above config

How can I accomplish this? Would I do this from within a local instance of vets-website? How can I test routing as it applies to revproxy? What steps would I take to test routing in revproxy? Or for revproxy?

holdenhinkle commented 1 year ago

Does this Ningx server exist yet? If so, where is it?

Will all requests coming to this server be specifically for preview instances? Will requests ever come in that are not for review instances?


My mental model for how this works: A new version of vets-website (based on branch and last commit) is built and deployed => The preview instance or PI going forward

A URL is created for this PI.

What is the anatomy of this URL? Can we get an example of it one?

The user is directed to the URL.

A request is made to the PI.

The RevProxy, RP going forward, intercepts the request, and routes it to the correct PI.

How does the RP know about newly created PIs? How is it updated?


From the little previous experience I had with RPs, Nginx can't discover new containers to route traffic to on its own. The service would have to be stopped, the new containers port/ip address/whatever would have to be added to the configuration, and then the service would have to be restarted.

We used Traefik, an RP for Docker that can be dynamically updated in real time and doesn't need to be stopped/started when new containers are spun up that it needs to route request to.

I asked ChatGPT about this:

Holden:

When running containers or pods in Docker or Kubernetes, is this the following statement true regarding Nginx as a rev proxy?

"From the little previous experience I had with RPs, Nginx can't discover new containers to route traffic to on its own. The service would have to be stopped, the new containers port/ip address/whatever would have to be added to the configuration, and then the service would have to be restarted." The statement you provided is partially true but does not consider the capabilities of Nginx when used with Docker or Kubernetes. In a typical setup, Nginx alone cannot automatically discover new containers or pods to route traffic to. However, when used in conjunction with Docker or Kubernetes, there are mechanisms that allow for dynamic configuration and automatic discovery."

GPT:

In a Docker environment, you can use tools like Docker Compose or Docker Swarm. Docker Swarm, for example, provides service discovery and load balancing features, which allow Nginx to route traffic to different containers without having to manually update the configuration.

In a Kubernetes environment, Nginx can be deployed as an Ingress controller, which is a more native solution for managing external access to services running in a Kubernetes cluster. The Ingress controller automatically discovers new services and pods based on Ingress resources and configures Nginx to route traffic to them. This eliminates the need to manually update the Nginx configuration for each new service or pod.

In both cases, you can achieve dynamic configuration and service discovery without having to stop the service, manually update the configuration, and restart the service.

Holden

Is a rev proxy like Traefik better for routing traffic to dynamically created pods or containers that have dynamically created URLs? Or can Nginx handle this use-case now?

GPT:

Traefik is often considered better suited for routing traffic to dynamically created containers or pods with dynamically created URLs, as it has been designed specifically for this purpose. It comes with built-in service discovery and can dynamically update its configuration without requiring a restart.

That being said, Nginx can also handle this use-case, especially when used in conjunction with other tools or platforms.

In a Docker environment, you can use Nginx with Docker Swarm or third-party tools like Consul Template to enable dynamic configuration updates based on container creation and termination events.

In a Kubernetes environment, as mentioned earlier, you can deploy Nginx as an Ingress controller. The Ingress controller will automatically discover new services and pods based on Ingress resources and update the Nginx configuration accordingly.

While both Traefik and Nginx can handle dynamic routing, Traefik is generally considered more user-friendly and better suited for dynamic environments out-of-the-box. However, Nginx is more mature and battle-tested, and it can be configured to handle dynamic routing when integrated with other tools or platforms. The choice between the two often comes down to personal preference, specific use-cases, and familiarity with either tool.


How did this work with review instances?

What are the differences between review instances and preview instances?

Can the RP config for review instances be repurposed? https://github.com/department-of-veterans-affairs/devops/blob/master/ansible/deployment/config/revproxy-vagov/templates/review_instance_nginx_revproxy.conf.j2

holdenhinkle commented 1 year ago

We're not able to answer these questions at this time.

Gia and I asked Hector the following question (one of the "Tasks" questions):

Is it possible to operate a single revproxy in support of preview environments?

The answer seems to be "probably" but he said there are too many unknowns to answer the question definitively.

I had planned to ask him the rest of the Tasks questions but felt they're too ambiguous and the Preview Instances MVP isn't done yet so we're not sure what that looks like architecturally yet.

He said that lots of discovery would need to be done just to answer the question I asked him. I really think it might be best to collaborate with Hector and his team on this, otherwise me might end up with a less-than-ideal solution that will probably take us much longer than it needs to take. If Hector has the availability, that is.

JoeTice commented 1 year ago

The team encountered challenges in completing this task, we feel it would be more helpful to start with an initial task to look at Review Instance's use and setup of revproxy to help point us in the right direction