Open userbradley opened 2 years ago
My solution was to use fleet Workload Identity
These links gave me glimpse to create a little PoC with a working workload identity https://cloud.google.com/anthos/fleet-management/docs/use-workload-identity https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity https://github.com/GoogleCloudPlatform/kubernetes-engine-samples/blob/31eb25ddfe20a8d38fd67e44bff9d5f16b6a503b/cloud-pubsub/deployment/pubsub-with-secret.yaml https://cloud.google.com/kubernetes-engine/docs/tutorials/authenticating-to-cloud-platform#config-connector
Thanks @Santhin - The links you've provided (well at least this one, and this one ) are still using the key.json
file
Can you share any modifications you needed to make to get Airbyte to work with workload ID over a SA key?
I am pretty familiar with K8's Workload Identity to GCP, we have a few deployments using them, but I'm unsure if Airbyte will work with it, as it seem to be expecting the key file.
Thoughts?
Exactly this was the wall for me on how to use workload identity using key.json
but the solution was to use Fleet workload identity which gives you the possibility to generate access token from Kubernetes service account.
Firstly u need to create sa:
resource "google_service_account" "sa_airbyte" {
account_id = "airbyte-admin"
}
resource "google_project_iam_member" "sa_airbyte" {
project = var.project
role = google_project_iam_custom_role.cr_airbyte.name
member = "serviceAccount:${google_service_account.sa_airbyte.email}"
}
resource "google_service_account_iam_member" "sa_airbyte" {
service_account_id = google_service_account.sa_airbyte.id
role = "roles/iam.workloadIdentityUser"
member = "serviceAccount:${var.project}.svc.id.goog[airbyte/airbyte-admin]"
}
I tested with a different name and account_id
must match the account used inside helm chart which is airbyte-admin
Now we need to create json file with impersonated credentials
I encourage you to follow this docs: https://cloud.google.com/anthos/fleet-management/docs/use-workload-identity#use_fleet_workload_identity
This var.airbyte_gcs_log_creds_payload
contains this json file:
{
"type": "external_account",
"audience": "identitynamespace:WORKLOAD_IDENTITY_POOL:IDENTITY_PROVIDER",
"service_account_impersonation_url": "https://iamcredentials.googleapis.com/v1/projects/-/serviceAccounts/GSA_NAME@GSA_PROJECT_ID.iam.gserviceaccount.com:generateAccessToken",
"subject_token_type": "urn:ietf:params:oauth:token-type:jwt",
"token_url": "https://sts.googleapis.com/v1/token",
"credential_source": {
"file": "/secrets/tokens/gcp-ksa/token" <- in our example token gonna be mounted in this location screens below
}
}
With this json file we need to create kubernetes secret in my example it was something like this:
resource "kubernetes_manifest" "airbyte_gcs_log_creds" {
manifest = {
"apiVersion" = "v1"
"data" = {
"gcp.json" = base64encode(var.airbyte_gcs_log_creds_payload)
}
"kind" = "Secret"
"metadata" = {
"name" = "airbyte-airbyte-gcs-log-creds"
"namespace" = "airbyte"
}
}
}
And now we gonna create ksa where we anotate our sa to ksa
(ksa - kubernetes service account)
Pls check this flag automountServiceAccountToken
we want to mount our access token in different location so it's must have
resource "kubernetes_manifest" "ksa_airbyte_admin" {
manifest = {
"apiVersion" = "v1"
"automountServiceAccountToken" = false
"kind" = "ServiceAccount"
"metadata" = {
"annotations" = {
"iam.gke.io/gcp-service-account" = var.sa_airbyte
}
"name" = "airbyte-admin"
"namespace" = "airbyte"
}
}
}
In my values for helm charts
serviceAccount:
create: false <- I don't want to create airbyte-admin with helm but with kubernetes manfiest
global:
logs:
gcs:
credentials: "/secrets/tokens/gcp-ksa/gcp.json" <- i make different path explenation later
minio:
enabled: true
server:
extraVolumeMounts:
- name: gcp-ksa
mountPath: /secrets/tokens/gcp-ksa
readOnly: true
extraVolumes:
- name: gcp-ksa
projected:
defaultMode: 420
sources:
- serviceAccountToken:
path: token
audience: playground-357914.svc.id.goog
expirationSeconds: 172800
- secret:
name: airbyte-airbyte-gcs-log-creds
worker:
extraVolumeMounts:
- name: gcp-ksa
mountPath: /secrets/tokens/gcp-ksa
readOnly: true
extraVolumes:
- name: gcp-ksa
projected:
defaultMode: 420
sources:
- serviceAccountToken:
path: token
audience: playground-357914.svc.id.goog
expirationSeconds: 172800
- secret:
name: airbyte-airbyte-gcs-log-creds
And here is the example of mounted files:
Here you can see my mounted secret twice
gcs-log-creds
<- this is created from helm charts
token
<- my overwrite
@userbradley If you have more questions about this implementation feel free to ask I will try to create some simple example with a public repo with this because I've seen tons of threads about this.
Additional notes: I didn't test this with gcp connector, for example, bigquery. If we can use the same method for using impersonated json file rather than private key from service account It would be huge :D.
@Santhin thanks for the comment, I'll try make sometime to look in to it.
Thought I'd just reply so you don't think I've ignored it - the team and I greatly appreciate your input and help!
With this solution are some drawbacks or some additional goods it depends how you gonna look on this.
Using fleet workload identity which gonna mount GOOGLE_APPLICATION_CREDENTIALS to worker pod in case of trying to create connection / destination using bigquery you gonna encounter weird error while uploading credentials json Something like that:
In first time I was confused why there is type external_account
when I'm trying to enter normal credentials with type service_account
I connected the dots and the connector for bigquery is trying to use my GOOGLE_APPLICATION_CREDENTIALS from worker. And here is the question doing a small rewrite inside the connector to bigquery gonna give us the possibility to enter impersonation creds rather than normal?
Doing small digging I found https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-bigquery/src/main/java/io/airbyte/integrations/destination/bigquery/BigQueryDestination.java#L163 https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-bigquery/sample_secret/credentials.json
@userbradley here link to issue https://discuss.airbyte.io/t/airbyte-using-fleet-workload-identity-overwrites-google-application-credentials-inside-connector/2277
i need work airbyte with Workload identity, please add feature
{ "type": "external_account", "audience": "identitynamespace:WORKLOAD_IDENTITY_POOL:IDENTITY_PROVIDER", "service_account_impersonation_url": "https://iamcredentials.googleapis.com/v1/projects/-/serviceAccounts/GSA_NAME@GSA_PROJECT_ID.iam.gserviceaccount.com:generateAccessToken", "subject_token_type": "urn:ietf:params:oauth:token-type:jwt", "token_url": "https://sts.googleapis.com/v1/token", "credential_source": { "file": "/secrets/tokens/gcp-ksa/token" <- in our example token gonna be mounted in this location screens below } }
@Santhin What IDENTITY_PROVIDER should be for a GKE cluster? Couldn't find in the links.
@yuriolive To retrieve values you can use gcloud container fleet memberships describe MEMBERSHIP
, where MEMBERSHIP
is your cluster's unique membership name in the fleet source
@yuriolive To retrieve values you can use
gcloud container fleet memberships describe MEMBERSHIP
, whereMEMBERSHIP
is your cluster's unique membership name in the fleet source
gcloud container fleet memberships list
The command doesn't return any membership. Are you using GKE too? You have to enable Anthos? Anthos has some cost involved so I would avoid if I could.
Tell us about the problem you're trying to solve
I am trying to setup Airbyte in a secure manner on a GKE cluster running on Google cloud.
A it stands, you need to create a service account and keys, then
base64
encode these values and store them as a secret in the Cluster.Describe the solution you’d like
Ideally I would like to use workload Identity, where we specify a service account that Airbyte uses on the cluster, which then impersonates and comes out the cluster as a GCP service account.
Describe the alternative you’ve considered or used
Simply not using the logging as it goes against our organizational policies of creating and exporting service account keys
Additional context
No
Are you willing to submit a PR?
Yes! I'm not 100% sure where I can help, perhaps with the KB writing!
Discourse post
https://discuss.airbyte.io/t/airbyte-using-fleet-workload-identity-overwrites-google-application-credentials-inside-connector/2277/1