electric-sql / electric

Sync little subsets of your Postgres data into local apps and services.
https://electric-sql.com
Apache License 2.0
6.53k stars 156 forks source link

Electric k8s deployment fails connecting to RDS instance #1087

Closed developer-ballastlane closed 7 months ago

developer-ballastlane commented 8 months ago

We are attempting to deploy electric-sql as a Kubernetes deployment. We have verified the network connections between the RDS database and the cluster, but still stuck at:

16:14:48.645 pid=<0.2488.0> [info] Starting ElectricSQL 0.9.4 in logical_replication mode.
16:14:48.649 pid=<0.2344.0> [notice]     :alarm_handler: {:set, {:system_memory_high_watermark, []}}
16:14:48.652 pid=<0.2493.0> [debug] Loading flag string from env ELECTRIC_FEATURES
16:14:48.652 pid=<0.2493.0> [debug] Got feature flag configuration %{proxy_ddlx_assign: false, proxy_ddlx_grant: false, proxy_ddlx_revoke: false, proxy_ddlx_unassign: false} with default value: false
16:14:48.661 pid=<0.2489.0> [info] Running Electric.Plug.Router with Bandit 1.1.3 at :::5133 (http)
16:14:48.671 pid=<0.3108.0> origin=postgres_1 [warning] Failed to load cacerts from the OS: :enoent
16:14:48.671 pid=<0.3108.0> origin=postgres_1 [debug] Attempting to initialize postgres_1: db_user@xxxxxxxxxxxxx.rds.amazonaws.com:5432
16:14:48.671 pid=<0.3108.0> origin=postgres_1 [info] Electric.Replication.Postgres.Client.with_conn(%{database: ~c"db_name", host: ~c"xxxxxxxxxxxxx.rds.amazonaws.com", ip_addr: ~c"x.x.x.x", ipv6: true, nulls: [nil, :null, :undefined], password: ~c"******", port: 5432, ssl: :required, ssl_opts: [server_name_indication: ~c"xxxxxxxxxxxxx.rds.amazonaws.com"], timeout: 5000, username: ~c"db_user"})
16:14:48.675 pid=<0.3111.0> [warning] Description: ~c"Server authenticity is not verified since certificate path validation is not enabled"
     Reason: ~c"The option {verify, verify_peer} and one of the options 'cacertfile' or 'cacerts' are required to enable this."

16:14:48.743 pid=<0.3108.0> origin=postgres_1 [debug] Successfully (re)defined SQL routine from '__pg_version.sql.eex'
16:14:48.745 pid=<0.3108.0> origin=postgres_1 [debug] Successfully (re)defined SQL routine from '__primary_key_list.sql.eex'
16:14:48.747 pid=<0.3108.0> origin=postgres_1 [debug] Successfully (re)defined SQL routine from '__resolve_table_from_names.sql.eex'
16:14:48.749 pid=<0.3108.0> origin=postgres_1 [debug] Successfully (re)defined SQL routine from '__session_replication_role.sql.eex'
16:14:48.750 pid=<0.3108.0> origin=postgres_1 [debug] Successfully (re)defined SQL routine from '__table_schema.sql.eex'
16:14:48.751 pid=<0.3108.0> origin=postgres_1 [debug] Successfully (re)defined SQL routine from 'alter_shadow_table.sql.eex'
16:14:48.753 pid=<0.3108.0> origin=postgres_1 [debug] Successfully (re)defined SQL routine from 'assign_default_version.sql.eex'
16:14:48.754 pid=<0.3108.0> origin=postgres_1 [debug] Successfully (re)defined SQL routine from 'assign_migration_version.sql.eex'
16:14:48.756 pid=<0.3108.0> origin=postgres_1 [debug] Successfully (re)defined SQL routine from 'capture_ddl.sql.eex'
16:14:48.757 pid=<0.3108.0> origin=postgres_1 [debug] Successfully (re)defined SQL routine from 'current_transaction_id.sql.eex'
16:14:48.758 pid=<0.3108.0> origin=postgres_1 [debug] Successfully (re)defined SQL routine from 'current_xact_id.sql.eex'
16:14:48.759 pid=<0.3108.0> origin=postgres_1 [debug] Successfully (re)defined SQL routine from 'current_xact_ts.sql.eex'
16:14:48.762 pid=<0.3108.0> origin=postgres_1 [debug] Successfully (re)defined SQL routine from 'ddlx/assign.sql.eex'
16:14:48.764 pid=<0.3108.0> origin=postgres_1 [debug] Successfully (re)defined SQL routine from 'ddlx/disable.sql.eex'
16:14:48.769 pid=<0.3108.0> origin=postgres_1 [debug] Successfully (re)defined SQL routine from 'ddlx/enable.sql.eex'
16:14:48.773 pid=<0.3108.0> origin=postgres_1 [debug] Successfully (re)defined SQL routine from 'ddlx/grant.sql.eex'
16:14:48.774 pid=<0.3108.0> origin=postgres_1 [debug] Successfully (re)defined SQL routine from 'ddlx/unassign.sql.eex'
16:14:48.776 pid=<0.3108.0> origin=postgres_1 [debug] Successfully (re)defined SQL routine from 'electrify.sql.eex'
16:14:48.777 pid=<0.3108.0> origin=postgres_1 [debug] Successfully (re)defined SQL routine from 'electrify/__validate_table_column_defaults.sql.eex'
16:14:48.779 pid=<0.3108.0> origin=postgres_1 [debug] Successfully (re)defined SQL routine from 'electrify/__validate_table_column_types.sql.eex'
16:14:48.781 pid=<0.3108.0> origin=postgres_1 [debug] Successfully (re)defined SQL routine from 'electrify/__validate_table_constraints.sql.eex'
16:14:48.783 pid=<0.3108.0> origin=postgres_1 [debug] Successfully (re)defined SQL routine from 'electrify/__validate_table_schema.sql.eex'
16:14:48.785 pid=<0.3108.0> origin=postgres_1 [debug] Successfully (re)defined SQL routine from 'electrify/generate_electrified_sql.sql.eex'
16:14:48.786 pid=<0.3108.0> origin=postgres_1 [debug] Successfully (re)defined SQL routine from 'find_fk_to_table.sql.eex'
16:14:48.788 pid=<0.3108.0> origin=postgres_1 [debug] Successfully (re)defined SQL routine from 'function_installers/reinstall_trigger_function.sql.eex'
16:14:48.790 pid=<0.3108.0> origin=postgres_1 [debug] Successfully (re)defined SQL routine from 'function_installers/utils.sql.eex'
16:14:48.792 pid=<0.3108.0> origin=postgres_1 [debug] Successfully (re)defined SQL routine from 'install_function__write_correct_max_tag.sql.eex'
16:14:48.794 pid=<0.3108.0> origin=postgres_1 [debug] Successfully (re)defined SQL routine from 'install_functions_and_triggers.sql.eex'
16:14:48.796 pid=<0.3108.0> origin=postgres_1 [debug] Successfully (re)defined SQL routine from 'migration_version.sql.eex'
16:14:48.798 pid=<0.3108.0> origin=postgres_1 [debug] Successfully (re)defined SQL routine from 'perform_reordered_op_installer_function.sql.eex'
16:14:48.800 pid=<0.3108.0> origin=postgres_1 [debug] Successfully (re)defined SQL routine from 'string_utils.sql.eex'
16:14:48.801 pid=<0.3108.0> origin=postgres_1 [debug] Successfully (re)defined SQL routine from 'db_user_helpers/__lookup_db_user_flag.sql.eex'
16:14:48.802 pid=<0.3108.0> origin=postgres_1 [debug] Successfully (re)defined SQL routine from 'tx_has_assigned_version.sql.eex'
16:14:48.803 pid=<0.3108.0> origin=postgres_1 [debug] Successfully (re)defined SQL routine from 'upsert_acknowledged_client_lsn.sql.eex'
16:14:48.807 pid=<0.3108.0> origin=postgres_1 [debug] Elixir.Electric.Replication.Postgres.Client: CREATE PUBLICATION "electric_publication"
16:14:48.808 pid=<0.3108.0> origin=postgres_1 [debug] Elixir.Electric.Replication.Postgres.Client: CREATE SUBSCRIPTION "postgres_1" CONNECTION 'host=electric.web.com port=443 dbname=electric connect_timeout=5000' PUBLICATION "electric_publication" WITH (connect = false)
16:14:48.814 pid=<0.3108.0> origin=postgres_1 [info] Successfully initialized origin postgres_1 at extension version
16:14:48.815 pid=<0.3117.0> [info] Starting Proxy server listening on port 65432
16:14:48.815 pid=<0.3118.0> pg_producer=postgres_1 [info] Starting Elixir.Electric.Postgres.Extension.SchemaCache for postgres_1
16:14:48.816 pid=<0.3118.0> pg_producer=postgres_1 [warning] SchemaCache "postgres_1" registered as the global instance
16:14:48.816 pid=<0.3121.0> [debug] Elixir.Electric.Replication.Postgres.LogicalReplicationProducer.init: publication: 'electric_publication', slot: 'electric_replication_out_db_name'
16:14:48.816 pid=<0.3121.0> [info] Starting replication from postgres_1
16:14:48.817 pid=<0.3121.0> [info] Electric.Replication.Postgres.LogicalReplicationProducer.init(%{database: ~c"db_name", host: ~c"xxxxxxxxxxxxx.rds.amazonaws.com", ip_addr: ~c"x.x.x.x", ipv6: true, nulls: [nil, :null, :undefined], password: ~c"******", port: 5432, replication: ~c"database", ssl: :required, ssl_opts: [server_name_indication: ~c"xxxxxxxxxxxxx.rds.amazonaws.com"], timeout: 5000, username: ~c"db_user"})
16:14:48.817 pid=<0.3121.0> [debug] Electric.Replication.Postgres.Client.connect(%{database: ~c"db_name", host: ~c"xxxxxxxxxxxxx.rds.amazonaws.com", ip_addr: ~c"x.x.x.x", ipv6: true, nulls: [nil, :null, :undefined], password: ~c"******", port: 5432, replication: ~c"database", ssl: :required, ssl_opts: [server_name_indication: ~c"xxxxxxxxxxxxx.rds.amazonaws.com"], timeout: 5000, username: ~c"db_user"})
16:14:48.820 pid=<0.3122.0> [warning] Description: ~c"Server authenticity is not verified since certificate path validation is not enabled"
     Reason: ~c"The option {verify, verify_peer} and one of the options 'cacertfile' or 'cacerts' are required to enable this."

▓ ┌────────────────────────────────────────────────────────┐
▓ │  MODULE ERROR: Electric.Replication.PostgresConnector  │
▓ ┕━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┙
▓
▓ Failed to start child :postgres_producer:
▓   {:bad_return_value, {:error, :invalid_authorization_specification}}
▓
▓ Please file a new issue on GitHub[1], including the contents of this error.
▓
▓ [1]: https://github.com/electric-sql/electric/issues
16:14:48.832 pid=<0.3108.0> origin=postgres_1 [error] PostgresConnectorSup failed to start child :postgres_producer with reason: {:bad_return_value, {:error, :invalid_authorization_specification}}.

••• Shutting down •••

[os_mon] cpu supervisor port (cpu_sup): Erlang has closed
[os_mon] memory supervisor port (memsup): Erlang has closed

These the environment definitions:

name  = "DATABASE_URL"
value = "postgresql://db_user:xxxxxxxxxxxxxxxx@xxxxxxxxxxxxx.rds.amazonaws.com:5432/db_name"

name  = "DATABASE_REQUIRE_SSL"
value = true

name  = "LOGICAL_PUBLISHER_HOST"
value = "electric.web.com"

name  = "LOGICAL_PUBLISHER_PORT"
value = "443"

name  = "PG_PROXY_PASSWORD"
value = "xxxxxxxxxx"

name  = "AUTH_JWT_ALG"
value = "HS512"

name  = "AUTH_JWT_KEY"
value = "xxx"

name  = "LOG_LEVEL"
value = "debug"

Any assistance is greatly appreciated.

linear[bot] commented 8 months ago

VAX-1753 Electric k8s deployment fails connecting to RDS instance

developer-ballastlane commented 8 months ago

This issue is still active, consistently getting the error:

origin=postgres_1 [error] PostgresConnectorSup failed to start child :postgres_producer with reason: {:bad_return_value, {:error, :invalid_authorization_specification}}.

We have tried several changes to the RDS parameter group and other details of our configuration in kubernetes, no luck so far.

FYI, we have another instance of electric running without issues in Kubernetes connected to another database running on the cluster, this only happens to us with RDS.

developer-ballastlane commented 7 months ago

For anyone interested in this, the trick with RDS is:

rds.logical_replication: true