airbytehq / terraform-provider-airbyte

Terraform Provider for Airbyte API
https://api.airbyte.com/
MIT License
40 stars 16 forks source link

Re-applying terraform config results in invalid connection configuration #31

Open exactlyaaron opened 10 months ago

exactlyaaron commented 10 months ago

I will start by saying that I am very new to terraform as well as airbyte. I am working on building a proof-of-concept connection of mysql to snowflake.

Airbyte 0.50.29 running on Docker Desktop in MacOS MySQL running on mac locally Snowflake running in GCP

airbyte source connector MySQL v3.0.4 airbyte destination connector Snowflake v3.1.2

When I do the initial terraform init/plan/apply everything works correctly and the connection is setup properly. The example I want to demo is adding a new stream to the terraform configuration that updates the connection streams accordingly. I have one incremental stream and want to add a full refresh stream. However, when I update the .tf file, I get a 400 error from the API.

Here are the steps I took:

After applying I get the 400 error:

│ Error: unexpected response from API. Got an unexpected response code 400
│ 
│   with airbyte_connection.mysql_snowflake_connection,
│   on main.tf line 76, in resource "airbyte_connection" "mysql_snowflake_connection":
│   76: resource "airbyte_connection" "mysql_snowflake_connection" {
│ 
│ **Request**:
│ PATCH /v1/connections/5210b869-537e-468a-9106-fff782860090 HTTP/1.1
│ Host: localhost:8006
│ Accept: application/json
│ Authorization: Basic YWlyYnl0ZTpwYXNzd29yZA==
│ Content-Type: application/json
│ User-Agent: speakeasy-sdk/terraform 0.3.3 2.86.10 1.0.0
│ 
│ {"configurations":{"streams":[{"cursorField":["_ab_cdc_cursor"],"name":"members","primaryKey":[["id"]],"syncMode":"incremental_deduped_history"},{"name":"movies"}]},"dataResidency":"auto","name":"local mysql \u003c\u003e snowflake
│ dev","namespaceDefinition":"source","nonBreakingSchemaUpdatesBehavior":"propagate_fully","schedule":{"scheduleType":"manual"},"status":"active"}
│ 
│ **Response**:
│ HTTP/1.1 400 Bad Request
│ Content-Length: 296
│ Connection: keep-alive
│ Content-Type: application/problem+json
│ Date: Tue, 12 Sep 2023 19:01:22 GMT
│ Server: nginx/1.25.2
│ 
│ {"type":"https://reference.airbyte.com/reference/errors","title":"bad-request","status":400,"detail":"The body of the request contains an invalid connection configuration. Primary key for stream: members is already pre-defined. Please do NOT
│ include a primary key configuration for this stream."}

Original main.tf config:

# provider.tf
terraform {
  required_providers {
    airbyte = {
      source = "airbytehq/airbyte"
      version = "0.3.3"
    }
  }
}

# provider.tf
provider "airbyte" {
  username = "airbyte"
  password = "password"
  server_url = "http://localhost:8006/v1"
}

resource "airbyte_source_mysql" "aaron_test_database" {
  configuration = {
    database           = "aaron_test"
    host               = "host.docker.internal"
    username           = "root"
    password           = ""
    port               = 3306
    replication_method = {
      source_mysql_update_method_read_changes_using_binary_log_cdc = {
        method                  = "CDC"
        server_time_zone        = "UTC"
      }
    }
    source_type        = "mysql"
    ssl_mode = {
      source_mysql_ssl_modes_preferred = {
        mode = "preferred"
      }
    }
    tunnel_method = {
      source_mysql_ssh_tunnel_method_no_tunnel = {
        tunnel_method = "NO_TUNNEL"
      }
    }
  }
  name         = "Aaron Test database"
  workspace_id = "fd460825-10a6-414b-a91e-3a3a2f444179"
}

resource "airbyte_destination_snowflake" "aaron_destination_snowflake" {
  configuration = {
    loading_method = {
        destination_snowflake_data_staging_method_recommended_internal_staging = {
            method = "Internal Staging"
        }
    }
    credentials = {
      destination_snowflake_authorization_method_username_and_password = {
        auth_type = "Username and Password"
        password            = "..."
      }
    }
    database         = "AIRBYTE_DATABASE"
    destination_type = "snowflake"
    host             = "..." 
    # jdbc_url_params  = "...my_jdbc_url_params..."
    # raw_data_schema  = "...my_raw_data_schema..."
    role             = "AIRBYTE_ROLE"
    schema           = "AIRBYTE_SCHEMA"
    use_1s1t_format  = true
    username         = "AIRBYTE_USER"
    warehouse        = "AIRBYTE_WAREHOUSE"

  }
  name         = "Aaron Snowflake Test"
  workspace_id = "fd460825-10a6-414b-a91e-3a3a2f444179"
}

resource "airbyte_connection" "mysql_snowflake_connection" {
  name                 = "local mysql <> snowflake dev"
  source_id            = airbyte_source_mysql.aaron_test_database.source_id
  destination_id       = airbyte_destination_snowflake.aaron_destination_snowflake.destination_id
  namespace_definition = "source"
  status               = "active"
  non_breaking_schema_updates_behavior = "propagate_fully"
  configurations       = {
    streams = [
      {
        name = "members"
        sync_mode = "incremental_deduped_history"
      }
    ]
  }
}

And the updated main.tf to include another stream. (All is the same besides the streams array in the connection resource)

# provider.tf
terraform {
  required_providers {
    airbyte = {
      source = "airbytehq/airbyte"
      version = "0.3.3"
    }
  }
}

# provider.tf
provider "airbyte" {
  username = "airbyte"
  password = "password"
  server_url = "http://localhost:8006/v1"
}

resource "airbyte_source_mysql" "aaron_test_database" {
  configuration = {
    database           = "aaron_test"
    host               = "host.docker.internal"
    username           = "root"
    password           = ""
    port               = 3306
    replication_method = {
      source_mysql_update_method_read_changes_using_binary_log_cdc = {
        method                  = "CDC"
        server_time_zone        = "UTC"
      }
    }
    source_type        = "mysql"
    ssl_mode = {
      source_mysql_ssl_modes_preferred = {
        mode = "preferred"
      }
    }
    tunnel_method = {
      source_mysql_ssh_tunnel_method_no_tunnel = {
        tunnel_method = "NO_TUNNEL"
      }
    }
  }
  name         = "Aaron Test database"
  workspace_id = "fd460825-10a6-414b-a91e-3a3a2f444179"
}

resource "airbyte_destination_snowflake" "aaron_destination_snowflake" {
  configuration = {
    loading_method = {
        destination_snowflake_data_staging_method_recommended_internal_staging = {
            method = "Internal Staging"
        }
    }
    credentials = {
      destination_snowflake_authorization_method_username_and_password = {
        auth_type = "Username and Password"
        password            = "..."
      }
    }
    database         = "AIRBYTE_DATABASE"
    destination_type = "snowflake"
    host             = "..." 
    # jdbc_url_params  = "...my_jdbc_url_params..."
    # raw_data_schema  = "...my_raw_data_schema..."
    role             = "AIRBYTE_ROLE"
    schema           = "AIRBYTE_SCHEMA"
    use_1s1t_format  = true
    username         = "AIRBYTE_USER"
    warehouse        = "AIRBYTE_WAREHOUSE"

  }
  name         = "Aaron Snowflake Test"
  workspace_id = "fd460825-10a6-414b-a91e-3a3a2f444179"
}

resource "airbyte_connection" "mysql_snowflake_connection" {
  name                 = "local mysql <> snowflake dev"
  source_id            = airbyte_source_mysql.aaron_test_database.source_id
  destination_id       = airbyte_destination_snowflake.aaron_destination_snowflake.destination_id
  namespace_definition = "source"
  status               = "active"
  non_breaking_schema_updates_behavior = "propagate_fully"
  configurations       = {
    streams = [
      {
        name = "members"
        sync_mode = "incremental_deduped_history"
      },
      {
        name = "movies"
      }
    ]
  }
}

If I remove all the source/destination/connection entries from airbyte manually and completely redo the terraform init/plan/apply steps with both streams from the start it works fine. The problem arises whenever I try to update the main.tf and plan/apply again.

It is very possible this is user error but any help is greatly appreciated! Whenever it works the connections are awesome! Just trying to get a proof-of-concept built that shows everything can be managed without the UI. Thank you.

exactlyaaron commented 10 months ago

The error seems to stem from the terraform.tfstate file that is created upon the "apply" command. It results in this configuration:

"configurations": {
              "streams": [
                {
                  "cursor_field": [
                    "_ab_cdc_cursor"
                  ],
                  "name": "members",
                  "primary_key": [
                    [
                      "id"
                    ]
                  ],
                  "sync_mode": "incremental_deduped_history"
                }
              ]
            },

The presence of that primary key configuration appears to be a problem when a secondary apply after changes is run.

exactlyaaron commented 10 months ago

Another follow-up detail. I attempted to utilize 2 full refresh streams to test.

Upon applying I get another error from the provider:

airbyte_connection.mysql_snowflake_connection: Modifying... [name=local mysql <> snowflake dev]
╷
│ Error: Provider produced inconsistent result after apply
│ 
│ When applying changes to airbyte_connection.mysql_snowflake_connection, provider "provider[\"registry.terraform.io/airbytehq/airbyte\"]" produced an unexpected new value: .configurations.streams[1].sync_mode: was null, but now
│ cty.StringVal("full_refresh_overwrite").
│ 
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.
thatkellenguy commented 10 months ago

I am experiencing the same behavior with a MSSQL <> Redshift resource for what its worth.

thiagoazcampos commented 9 months ago

Ran into the same problem with the Google Analytics 4 source, and maybe it happens with any source that has pre-defined primary keys.

It looks like there might be a GET request to retrieve the current configs for the connection (including each stream's primaryKey) and then a PATCH request with this retrieved current state and the changes, running into the error.

Maybe there should be a verification on if each stream has a source defined primary key, so it won't be included on the PATCH request. This could be verified by checking the sourceDefinedPrimaryKey field on the response of the GET streams route.

yamhirotoCS commented 8 months ago

Have the same issue with GoogleSearchConsole, and indeed those streams have pre-defined primary keys.

yamhirotoCS commented 7 months ago

Same for JIRA

Santhin commented 7 months ago

Same for Postgres <> Snowflake

sowla commented 2 months ago

Same for us when using Hubspot/Chargebee/Stripe connections with incremental append sync mode to BigQuery.

AlaaInflyter commented 1 month ago

Same for Stripe -> Snowflake (and MySQL -> Snowflake but that's already mentioned above)

hongbo-miao commented 3 weeks ago

@szemek helped fix at https://github.com/airbytehq/terraform-provider-airbyte/pull/111 Hope Airbyte team can help merge it soon! ☺️