JasperFx / wolverine

Supercharged .NET server side development!
https://wolverinefx.net
MIT License
1.25k stars 137 forks source link

Postgres db command execution fails with npgsql and multihost (primary/replica) connection #1110

Open rolyv opened 2 weeks ago

rolyv commented 2 weeks ago

Describe the bug We use multi host Postgres connection string that includes a primary and a read replica. We also have Marten ReadSessionPreference set to PreferStandby. Wolverine appears to have issues with this setup and we're seeing exceptions in the logs like this:

2024-10-31 12:52:07.773 EDT.    delete from public.wolverine_control_queue where expires < @p0;select body from public.wolverine_control_queue where node_id = @p1;
2024-10-31 12:52:07.773 EDT
---> Npgsql.PostgresException (0x80004005): 25006: cannot execute DELETE in a read-only transaction
at Npgsql.Internal.NpgsqlConnector.ReadMessageLong(Boolean async, DataRowLoadingMode dataRowLoadingMode, Boolean readingNotifications, Boolean isReadingPrependedMessage)
at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16 token)
at Npgsql.NpgsqlDataReader.NextResult(Boolean async, Boolean isConsuming, CancellationToken cancellationToken)
at Npgsql.NpgsqlDataReader.NextResult(Boolean async, Boolean isConsuming, CancellationToken cancellationToken)
at Npgsql.NpgsqlCommand.ExecuteReader(Boolean async, CommandBehavior behavior, CancellationToken cancellationToken)
at Npgsql.NpgsqlCommand.ExecuteReader(Boolean async, CommandBehavior behavior, CancellationToken cancellationToken)
at Npgsql.NpgsqlCommand.ExecuteDbDataReaderAsync(CommandBehavior behavior, CancellationToken cancellationToken)
at Wolverine.RDBMS.Polling.DatabaseOperationBatch.ExecuteAsync(IWolverineRuntime runtime, CancellationToken cancellationToken) in /home/runner/work/wolverine/wolverine/src/Persistence/Wolverine.RDBMS/Polling/DatabaseOperationBatch.cs:line 68

To Reproduce Steps to reproduce the behavior:

  1. Use multi host connection string like: Host=primary-ip,read-replica-ip;Username=...;Password=...;Database=...;Load Balance Hosts=true
  2. Set Marten ReadSessionPreference to PreferStandby
  3. Add Wolverine
  4. Watch go boom

Expected behavior Wolverine should be able to execute commands against primary

Additional context Using Marten 7.29.0 Wolverine 3.1.0 Npgsql 8.0.4 I'm registering NpgsqlDataSource via the Aspire.Npgsql nuget package 8.2.0 extension method

rolyv commented 1 week ago

Thanks for the quick fix. I think there might still be an issue with the initial schema creation though. I'm still seeing this exception when I start up my project, but I can't tell if this is Wolverine, Marten, or both.

Npgsql.PostgresException (0x80004005): 25006: cannot execute CREATE FUNCTION in a read-only transaction
         at Npgsql.Internal.NpgsqlConnector.ReadMessageLong(Boolean async, DataRowLoadingMode dataRowLoadingMode, Boolean readingNotifications, Boolean isReadingPrependedMessage)
         at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16 token)
         at Npgsql.NpgsqlDataReader.NextResult(Boolean async, Boolean isConsuming, CancellationToken cancellationToken)
         at Npgsql.NpgsqlDataReader.NextResult(Boolean async, Boolean isConsuming, CancellationToken cancellationToken)
         at Npgsql.NpgsqlCommand.ExecuteReader(Boolean async, CommandBehavior behavior, CancellationToken cancellationToken)
         at Npgsql.NpgsqlCommand.ExecuteReader(Boolean async, CommandBehavior behavior, CancellationToken cancellationToken)
         at Npgsql.NpgsqlCommand.ExecuteNonQuery(Boolean async, CancellationToken cancellationToken)
         at Weasel.Postgresql.PostgresqlMigrator.executeDelta(SchemaMigration migration, DbConnection conn, AutoCreate autoCreate, IMigrationLogger logger, CancellationToken ct)
        Exception data:
          Severity: ERROR
          SqlState: 25006
          MessageText: cannot execute CREATE FUNCTION in a read-only transaction
          File: utility.c
          Line: 414
          Routine: PreventCommandIfReadOnly
         --- End of inner exception stack trace ---
         at Marten.StoreOptions.Weasel.Core.Migrations.IMigrationLogger.OnFailure(DbCommand command, Exception ex)
         at Weasel.Postgresql.PostgresqlMigrator.executeDelta(SchemaMigration migration, DbConnection conn, AutoCreate autoCreate, IMigrationLogger logger, CancellationToken ct)
         at Weasel.Core.Migrations.DatabaseBase`1.ApplyAllConfiguredChangesToDatabaseAsync(IGlobalLock`1 globalLock, Nullable`1 override, ReconnectionOptions reconnectionOptions, CancellationToken ct)
         at Weasel.Core.Migrations.DatabaseBase`1.ApplyAllConfiguredChangesToDatabaseAsync(IGlobalLock`1 globalLock, Nullable`1 override, ReconnectionOptions reconnectionOptions, CancellationToken ct)
         at Oakton.Resources.ResourceSetupHostService.<>c__DisplayClass5_0.<<StartAsync>g__execute|0>d.MoveNext()
         --- End of inner exception stack trace ---
         --- End of inner exception stack trace ---
         at Oakton.Resources.ResourceSetupHostService.StartAsync(CancellationToken cancellationToken)
         at Microsoft.Extensions.Hosting.Internal.Host.<StartAsync>b__15_1(IHostedService service, CancellationToken token)
         at Microsoft.Extensions.Hosting.Internal.Host.ForeachService[T](IEnumerable`1 services, CancellationToken token, Boolean concurrent, Boolean abortOnFirstException, List`1 exceptions, Func`3 operation)

It looks like both projects depend on Weasel for applying db changes, so perhaps there's something missing there? I was looking thru Weasel and thought maybe this file needs to specify the Target Session Attribute: https://github.com/JasperFx/weasel/blob/master/src/Weasel.Postgresql/Connections/DefaultNpgsqlDataSourceFactory.cs

If you agree, I'm happy to submit a PR.

Hawxy commented 1 week ago

Damn, I thought we only had a single entry-point for npgsql DS usage. I'll properly test this end to end over the weekend.

Hawxy commented 1 week ago

Could you try using AddMultiHostNpgsqlDataSource from Npgsql.DependencyInjection instead of Aspire.Npgsql? I don't see anything in the Aspire implementation that points to a multi-host data source being supported.

jeremydmiller commented 1 week ago

@rolyv I think we might need a little more information about how you're bootstrapping. Wolverine would be taking the NpgsqlDataSource from Marten's usage if you were using AddMarten()IntegrateWithWolverine()

rolyv commented 1 week ago

I am using AddMarten().IntegrateWithWolverine(). I'm not sure why I didn't run into this before adding Wolverine.

I was able to confirm that the NpgsqlDataSourceBuilder returns an instance of NpgsqlMultiHostDataSource if the connection string has multiple hosts: https://github.com/npgsql/npgsql/blob/main/src/Npgsql/NpgsqlSlimDataSourceBuilder.cs#L645

Output from debug console when trying to get NpgsqlDataSource:

app.Services.GetService<NpgsqlDataSource>()
  {Npgsql.NpgsqlMultiHostDataSource}
    ConnectionString: "Host=localhost:5432,localhost:5433;Username=postgres;Database=test;Load Balance Hosts=True"
Hawxy commented 4 days ago

Would you be able to provide a reproduction? I've configured a wolverine + marten environment with a multi-host configuration and I'm unable to get it to error out. You might be using a feature I haven't accounted for.

rolyv commented 2 days ago

Sorry for the delay. It took me a while to come up with a consistent repro. I created a simple app that uses a lot of the same components I'm using in my real app. I'm guessing there is some randomness in how the host is picked per command. When I had the primary first in the connection string, I wasn't encountering the exception. But when I switched the connection string to have the read replica first, it started happening consistently. But it's not consistent on which command is the one that fails. Sometimes it would be a drop command that would fail. Sometimes it's the create function. I created this repo with the app I used to repro the issue. Hope this helps: https://github.com/rolyv/marten-repro