airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
14.75k stars 3.79k forks source link

[secret-manager] Cannot enable AWS Secrets Manager for secret storage with existing connections #36754

Open adam-bloom opened 3 months ago

adam-bloom commented 3 months ago

Helm Chart Version

0.62.0

What step the error happened?

Other

Revelant information

I added configuration to use AWS Secrets Manager for source/destination credentials to a deployment with existing connections. My assumption was that newly created connections would use secrets manager (which did indeed work!), while existing ones continued to use local storage until they were updated. The latter is where things went south.

  1. I tried to update the credentials to get them to insert into secrets manager, but any updates to the source or destination failed with a 400 error from secrets manager
  2. I tried to delete the source/destination to recreate it, but that also failed with a 400 error from secrets manager, since it couldn't delete a secret it hadn't created

I'm on airbyte 0.55.2.

The workaround was to try the delete, get the secret ID it wanted to delete, manually create it (remove the _v1 from the name), and then you can delete the source/destination.

This is very user unfriendly, and I believe an expected backward compatibility miss.

Relevant log output

No response

marcosmarxm commented 2 months ago

@adam-bloom I added this to the deployment team backlog for further investigation. Can you share your values file (at least the part you're configuring the secret manager?)

adam-bloom commented 2 months ago

Here's the applicable section (this is from our template, so doesn't have the rendered values):

global:
  storage:
    type: s3
    bucket:
      log: ${s3_logs_bucket}
      state: ${state_bucket}
    s3:
      authenticationType: instanceProfile
      region: ${aws_region}
  secretsManager:
    type: awsSecretManager
    awsSecretManager:
      authenticationType: instanceProfile
      region: ${aws_region}
      kms: ${kms_key_arn}
adam-bloom commented 2 months ago

In case others run into this, here's my current workaround (and how to know if you're affected).

The airbyte-worker logs will indicate if the sync failed to start due to a secrets issue (the UI will just display a generic platform error). Here's an example:

2024-04-15 22:23:58 WARN i.a.c.s.p.AwsSecretManagerPersistence(read):48 - Secret airbyte_workspace_6befefaf-fd24-4b7d-849e-028fdcc70b9f_secret_8354812d-1ccb-410e-98a6-261b81d0d319 not found
java.lang.RuntimeException: That secret was not found in the store! Coordinate: airbyte_workspace_6befefaf-fd24-4b7d-849e-028fdcc70b9f_secret_8354812d-1ccb-410e-98a6-261b81d0d319_v3

Fetch the secret value from the airbyte database: select * from secrets where coordinate = 'airbyte_workspace_6befefaf-fd24-4b7d-849e-028fdcc70b9f_secret_8354812d-1ccb-410e-98a6-261b81d0d319_v3';

Next, create an AWS Secrets Manager secret with a name of airbyte_workspace_6befefaf-fd24-4b7d-849e-028fdcc70b9f_secret_8354812d-1ccb-410e-98a6-261b81d0d319 and the value of the payload column returned from the airbyte database query. Use the same KMS key that you configured airbyte to use for secrets manager to ensure it can access it.

Clearly, Airbyte is not attempting to migrate its current secret store to secrets manager when configured, nor does it use the internal store as a fallback. One of those two methods would be greatly appreciated for backwards compatibility. It'd be rather trivial to write a migration script too if this is something users are expected to do.

NAjustin commented 4 days ago

I do think it really does need to migrate the secrets for you if the user has permissions, or offer some type of fallback . . . as this is a big sticking point (and not having a useful error message doesn't help).

To add to @adam-bloom 's workaround, if you need to migrate a lot of existing secrets, you can also bulk-create them by generating a shell script of CLI commands from the secrets values.

For AWS, this looks something like this:

SELECT STRING_AGG('aws secretsmanager create-secret --name "'||coordinate||'" --secret-string="'||REPLACE(payload,'"','\"')||'"',E'\n') AS commands FROM secrets

. . . which will produce a list of commands like this:

aws secretsmanager create-secret --name "airbyte_workspace_00000000-0000-0000-0000-000000000000_secret_abc123-abc123-1234-abc123_v1" --secret-string="OPENSESAME"
aws secretsmanager create-secret --name "airbyte_workspace_11111111-1111-1111-1111-111111111111_secret_xyz987-xyz987-0987-xyz987_v1" --secret-string="DrOwSsAp"

Google Secrets Manager is similar, but you have to use --data-file=- to get it to take input from STDIN:

SELECT STRING_AGG('printf "'||REPLACE(payload,'"','\"')||'" | gcloud secrets create "'||coordinate||'" --data-file=-',E'\n') AS commands FROM secrets

. . . resulting in something like:

printf "OPENSESAME" | gcloud secrets create "airbyte_workspace_00000000-0000-0000-0000-000000000000_secret_abc123-abc123-1234-abc123_v1" --data-file=-
printf "DrOwSsAp" | gcloud secrets create "airbyte_workspace_11111111-1111-1111-1111-111111111111_secret_xyz987-xyz987-0987-xyz987_v1" --data-file=-

Regardless of which you use, I would recommend saving the output of the query as a shell script, assigning it execute permissions, and running it that way (to avoid leaking secrets to your personal command history).