bcgov / DITP-DevOps

Digital Identity and Trust Program Team's DevOps Documentation Repository
Apache License 2.0
2 stars 5 forks source link

Add an OpenShift Deployment for an EMDT Traceability Traction Tenant Controller for use with the VC Traceability Test Suite #170

Open swcurran opened 4 months ago

swcurran commented 4 months ago

Tagging folks: @WadeBarnes @esune @PatStLouis @krobinsonca

Patrick has developed a Traceability Traction Tenant Controller (TTTC - github reference to be added) that interacts with the Traceability Test Suite (TTS - link to be added). The intention of the TTS is that each participant standup an active Web component that can be used to test conformance with any other participant of the TTS. The purpose of this issue is to request cooperation in standing up an OpenShift workspace to hosts the TTTC such that it can be used by other TTS participants.

My thoughts on what needs to be done. I expect that others — especially Wade, Emiliano and Patrick will extend/update/correct this list to make it accurate and to enable the work be done over the next short while. As I understand it, there is not a lot to do this, and so the hope is that this can be done with a meeting or two, Discord discussions as needed, and everyone doing a little bit off the side of their desks. If this is a bigger thing — we’ll determine that quickly.

Tasks:

Thoughts on other things to be done?

An interesting question is if Patrick needs access to the OpenShift instance. Ideally not, but will that just create an undue burden on others?

WadeBarnes commented 4 months ago

In order to determine the most appropriate home and access controls for the new controller, I'll need some more information about the TTTC itself, along with how it interacts with an agent and the TTS. Things like;

For monitoring. Are there any health endpoints on the TTTC we could utilize for health/uptime monitoring? What conditions need to be met in order to consider the service as operational?

EMDT has their own set of namespaces in OCP. Will their agent be hosted there, or is the plan to have them use a Tranction tenant hosted in the main Traction environemnt?

swcurran commented 4 months ago

Answers:

Let’s have a call about this.

PatStLouis commented 4 months ago

Thanks @swcurran for setting up this issue.

TTTC is an implementation of the w3c-ccg traceability specification which leverages traction/aca-py as the backend service.

Here's some answers for @WadeBarnes questions:

How does the TTTC connect with and interact with an agent?

The TTTC will interact with the agent by requesting a traction multi-tenant token and sending authenticated requests to the API

What operations can a TTTC have an agent perform?

The TTTC will use the following endpoints from the traction api:

How do you register an agent with the TTTC?

When deploying TTTC, you pass in a client_id/api_key as environment variables The application will be able to request tokens as long as the api key still exist in the tenant's api keys

Is there a TTTC instance per tenant, or is it itself a multi-tenant service?

Per tenant

How does the TTS interact with the TTTC?

The TTS is ran every 24 hours through github actions There is a test client for conformity testing and interoperability testing. (See Interoperability Report and Conformance Testing actions) The results will be published. This is similar in design to how aath operates, with the exception that it tests live services. https://w3c-ccg.github.io/traceability-interop/reports/conformance/ https://w3c-ccg.github.io/traceability-interop/reports/interoperability/ https://w3c-ccg.github.io/traceability-interop/reports/archive/

How do you register a TTTC instance with the TTS?

Here's the official procedure.

For monitoring. Are there any health endpoints on the TTTC we could utilize for health/uptime monitoring? What conditions need to be met in order to consider the service as operational?

I've added a /healthz endpoint, we can have endpoints to setup readiness/liveliness and healthiness k8s probes.

EMDT has their own set of namespaces in OCP. Will their agent be hosted there, or is the plan to have them use a Tranction tenant hosted in the main Traction environemnt?

As long as the traction instance doesn't reset periodically as we do not want to have to resubmit the information to the TTS maintainers.

PatStLouis commented 4 months ago

+1 to have a call as I also would like to understand the deployment models we will aim for. I can also do a quick demonstration of how the app will interact with Traction.

Here's the project, I have a branch for this specific deployment. From my understanding the charts will live in a different repo and the deployment will be done mostly through GA?

A few questions that come to mind:

The deployment itself is a simple fastapi server with a connection to a postgres database. Until v0.12.0 is available in a traction instance, I also deploy an agent for verification (this agent does not need to be exposed.)

PatStLouis commented 4 months ago

EMDT has their own set of namespaces in OCP. Will their agent be hosted there, or is the plan to have them use a Tranction tenant hosted in the main Traction environemnt?

Reading back on this question, I think it would make most sense if the TTTC instance is deployed in a EMDT namespace and use a traction tenant from the main traction environment.

swcurran commented 4 months ago

EMDT has their own set of namespaces in OCP. Will their agent be hosted there, or is the plan to have them use a Tranction tenant hosted in the main Traction environemnt?

I don’t think this is an either/or. The plan is for them to use a Traction Tenant in Traction Dev (or Test or Prod as you see fit). The TTTC itself will need an OCP namespace and it should go wherever is easiest. Who manages the EMDT namespaces? What I don’t think we want is to use an OCP namespace managed by EMLI folks that know nothing about Digital Trust.

WadeBarnes commented 4 months ago

From my understanding the charts will live in a different repo and the deployment will be done mostly through GA?

Typically the charts live with the application, provided they are somewhat generic (deploys to a K8S environment), and the values file (the environment specific settings) would be contained in a separate repo. What we try to avoid is imposing our specific infrastructure (BC Gov OCP platform specifics) on others. Traditionally our OCP templates have been contained in a separate repo because they are tailored to the BC Gov OCP environments. As we move to Helm charts we're consciously making them more generic.

esune commented 4 months ago

From my understanding the charts will live in a different repo and the deployment will be done mostly through GA?

What we have done for our projects is have a charts folder that contains the generic charts - a separate repo is used to store the values.yaml files that have deployment-specific settings (URLs, database provider settings, etc.).

Reading back on this question, I think it would make most sense if the TTTC instance is deployed in a EMDT namespace and use a traction tenant from the main traction environment.

What is the expected lifespan for this project? If it is a prototype/demo and it is expected to stay as such, we may want to possibly use our demo namespaces in OCP rather than requesting a new set that will add to the list and be only partially used? Unsure about using the EMDT namespaces as I don't know how much they're involved technically, especially if this is a prototype that may/may not evolve and be there long term.

@PatStLouis is the service stateless, or do we need to provision storage in form of a database? If this is the case, we will want to conenct it to a backup instance so we have the data if we ever need to move and restore it elsewhere.

PatStLouis commented 4 months ago

Thanks for the information @WadeBarnes @esune

What you described is what I currently have. I usually template my values.yml file to be overwritten with a GA. So if I understand you would have a sperate repo with a GA to clone, build, push and deploy the application? Makes sense, do you have an example for me to have a look at? I've deployed github ARC in our infra and this is how we manage our deployments. Where do you push the docker images and how do you connect to your cluster?

The lifespan of it will be however long BCGov wants to be published as a w3c-ccg traceability spec implementor. This instance will only be used for conformity/interoperability testing and demonstration. The demo namespace might be the way to go for the time being.

I have a postgresDB service in my architecture. This can be deployed along the application or we can just provide a connection url to the application if you use an external DB service. We will need permanent storage for did documents, oauth client information and status list credentials.

esune commented 4 months ago

@PatStLouis check out the traction and/or vc-authn repos. For those we publish charts to their gh-pages site so that they can be installed without having to clone the repo - see https://github.com/bcgov/vc-authn-oidc/tree/main/.github/actions/chart_releaser as an example

WadeBarnes commented 4 months ago

@PatStLouis check out the traction and/or vc-authn repos. For those we publish charts to their gh-pages site so that they can be installed without having to clone the repo - see https://github.com/bcgov/vc-authn-oidc/tree/main/.github/actions/chart_releaser as an example

The values files for those can be found in a separate repo, here; https://github.com/bcgov/trust-over-ip-configurations/tree/main/helm-values

PatStLouis commented 4 months ago

@esune @WadeBarnes Here's the current state of the charts. I based them on the traction deployment and simplified some components. I also copied the chart release files.

I will need some clarification on how we will manage tls certificates to complete the ingress configuration. Only the controller service will need an ingress and be exposed publicly. The controller will communicate with the agent/db through their respective internal service endpoints. The agent does not need a connection to a ledger or a db, it's only used to verify proofs on json-ld credentials. Issuance will be made through the traction instance.

Let me know if network policies/service accounts are needed.

The controller has a secret ressource with all required environment variables which is injected into the deployment. They just need to be populated when we are ready to deploy.

For the domain, I think a *.interop.vonx.io would be suitable. Here's some suggestions...

traceability.interop.vonx.io trace.interop.vonx.io vc.interop.vonx.io

What are the suggested next step? I'm available friday for a session where we could proceed with the repo transfer/deployment.

WadeBarnes commented 4 months ago

With the certs, we typically use the BCDevOps/certbot to manage Let's Encrypt certificates on our Routes in OpenShift. Any Route labeled with certbot-managed=true gets a certificate assigned and managed. We need to determine how best to get that to work with this service. I may be wrong but I don't think our OCP environment supports any certificate issuers for use with ingress resources, will check. With Traction both routes and ingress are provisioned.

WadeBarnes commented 4 months ago

For network policies, one will need to be defined to allow ingress to the pod exposing the public endpoint, and one each for the inter-pod communications.

Examples:

PatStLouis commented 4 months ago

I'll add those, there was a comment claiming if ingress is enabled openshift configuration isn't required. From your last comment, we will need to deactivate the ingress and use openshift routes instead to enable tls?

WadeBarnes commented 4 months ago

I believe if you leave the ingressClassName as a blank string, OpenShift will automatically create the associated Routes. That's how the vc-authm-oidc charts behave. From there you just need to ensure they (the Routes) are labeled correctly. The certbot-managed label should remain set to false until we've finalized the URL.

@i5okie please correct me if I have the process incorrect.

WadeBarnes commented 4 months ago

For the domain, I think a *.interop.vonx.io would be suitable. Here's some suggestions...

traceability.interop.vonx.io trace.interop.vonx.io vc.interop.vonx.io

My vote is for traceability.interop.vonx.io

PatStLouis commented 4 months ago

@WadeBarnes @esune @i5okie Let's have a session to deploy this project. Is there availability today or tomorrow around 17hPM EDT (14h PDT)?

PatStLouis commented 3 months ago

@swcurran the controller is available at https://traceability.interop.vonx.io. It's currently pointing to a sandbox tenant and I will run the test-suites again tonight to make sure all is still running smooth. In the meantime I'll submit a ticket for a dev/test tenant to use when we are ready to submit the implementation to the w3c-ccg.

swcurran commented 3 months ago

Awesome — nice work!

PatStLouis commented 3 months ago

@esune I managed to pass most of the conformance tests, I'm getting gateway timeouts when verifying credentials and resolving did's. These are the 2 operations which rely on the deployed agent so I'm suspecting that the controller is unable to talk with the agent admin endpoint. It might have to do with the network policy. Can we have a look?

WadeBarnes commented 3 months ago

@PatStLouis, It's an issue with the Network Policy. The controller pod is missing the expected app label. You'll need to update the charts. I'll manually update the pod spec for now to add the label.

image

WadeBarnes commented 3 months ago

The controller can talk to the aca-py instance now: image

PatStLouis commented 3 months ago

@WadeBarnes I confirm that it's working, thanks!

esune commented 3 months ago

Thanks @WadeBarnes - you got to it before I did.

WadeBarnes commented 3 months ago

WTZ FTW :grin: