dagster-io / dagster

An orchestration platform for the development, production, and observation of data assets.
https://dagster.io
Apache License 2.0
11.67k stars 1.47k forks source link

AirbyteManagedElementReconciler incorrectly detecting masked fields in the S3 Destination and always thinks it's out of sync #13823

Open kjoconnor opened 1 year ago

kjoconnor commented 1 year ago

Dagster version

1.3.0

What's the issue?

When configuring an S3Destination with an AWS Access Key ID and AWS Access Key Secret, dagster-airbyte check and dagster-airbyte apply as well as the drift detection in the UI will always think the Airbyte destination is out of sync due to Airbyte returning masked values in the API for the Access Key ID and Access Key Secret.

The Access Key Secret is correctly ignored because secret appears in the field name from Airbyte and secret appears in this list.

Here's some output showing the reconciler checking a couple existing S3Destinations and noting that they need the access key & secret applied, then I run the apply which says it successfully ran, and then immediately checking again and it's showing it is out of sync.

(.venv) kevin@kevin-mbp pipeline-demo % dagster-airbyte check --module testing.assets
Loading module...
Found 1 reconcilers, checking...

Changes found:
+ 08ac571d-c0fa-48fa-8f51-a96ae381f3f7 Stripe to S3 Destination:
  + access_key_id: <my plaintext access key id>  # This is my AWS Access Key in plain text, redacted here by me
  + secret_access_key: **********                # This is masked by Dagster/Airbyte
+ 08ac571d-c0fa-48fa-8f51-a96ae381f3f7 Bank of Kevin to S3 Destination:
  + access_key_id: <my plaintext access key id>  # This is my AWS Access Key in plain text, redacted here by me
  + secret_access_key: **********                # This is masked by Dagster/Airbyte

(.venv) kevin@kevin-mbp pipeline-demo % dagster-airbyte apply --module testing.assets
Found 1 reconcilers, applying...

Changes applied:
+ 08ac571d-c0fa-48fa-8f51-a96ae381f3f7 Stripe to S3 Destination:
  + access_key_id: <my plaintext access key id>  # This is my AWS Access Key in plain text, redacted here by me
  + secret_access_key: **********                # This is masked by Dagster/Airbyte
+ 08ac571d-c0fa-48fa-8f51-a96ae381f3f7 Bank of Kevin to S3 Destination:
  + access_key_id: <my plaintext access key id>  # This is my AWS Access Key in plain text, redacted here by me
  + secret_access_key: **********                # This is masked by Dagster/Airbyte

(.venv) kevin@kevin-mbp pipeline-demo % dagster-airbyte check --module testing.assets
Loading module...
Found 1 reconcilers, checking...

Changes found:

~ 08ac571d-c0fa-48fa-8f51-a96ae381f3f7 Stripe to S3 Destination:
  + access_key_id: <my plaintext access key id>  # This is my AWS Access Key in plain text, redacted here by me
~ 08ac571d-c0fa-48fa-8f51-a96ae381f3f7 Bank of Kevin to S3 Destination:
  + access_key_id: <my plaintext access key id>  # This is my AWS Access Key in plain text, redacted here by me

What did you expect to happen?

I would expect dagster-airbyte check & apply to correctly note that it has successfully modified the destination and to stop saying it's out of sync.

How to reproduce?

Create an S3Destination for Airbyte like so:

AWS_ACCESS_KEY_ID = "abc123"
AWS_ACCESS_KEY_SECRET = "def456"

s3_destination = S3Destination(
        name=f"{customer_id} Bank of Kevin to S3 Destination",
        s3_bucket_name="my-sandbox",
        s3_bucket_region="us-west-1",
        s3_bucket_path=f"{customer_id}/bank-of-kevin",
        access_key_id=AWS_ACCESS_KEY_ID,
        secret_access_key=AWS_ACCESS_KEY_SECRET,
        format=S3Destination.JSONLinesNewlineDelimitedJSON(
            format_type="JSONL",
            compression=S3Destination.NoCompression(compression_type="No Compression"),
        ),
    )

Then run check and apply. The destination will be correctly created in Airbyte and will work, but Dagster will never allow it to be used in an asset because it thinks it's out of sync.

Deployment type

Local

Deployment details

I'm also running Airbyte locally.

Additional information

No response

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

kjoconnor commented 1 year ago

I would make a PR to fix things but I'm not sure what the right way to go about it is. One way would be to have a smarter way to flag fields as secret, or perhaps there's some Airbyte API setting I'm not seeing to allow unmasked values?

yusufshalaby commented 1 year ago

Just commenting to drive engagement. I ran into this today.