Closed Erwinvandervalk closed 1 month ago
@Erwinvandervalk What's the actual exception that gets thrown in your case? If anything, this is an incorrect configuration of Polly. I'm not seeing anything that's particularly actionable right now
Hi Jeremy, Thanks for your reply.
Here's the exception that I get. This is the same exception that I get in Azure, but as far as I understand it's the default exception that happens if a timeout occurs.
Marten.Exceptions.MartenCommandException: Marten Command Failure:$
Postgresql timed out while trying to read data. This may be caused by trying to read locked rows
$
$
Exception while reading from stream
---> Npgsql.NpgsqlException (0x80004005): Exception while reading from stream
---> System.TimeoutException: Timeout during reading attempt
at Npgsql.Internal.NpgsqlReadBuffer.<Ensure>g__EnsureLong|55_0(NpgsqlReadBuffer buffer, Int32 count, Boolean async, Boolean readingNotifications)
at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token)
at Npgsql.Internal.NpgsqlConnector.ReadMessageLong(Boolean async, DataRowLoadingMode dataRowLoadingMode, Boolean readingNotifications, Boolean isReadingPrependedMessage)
at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16 token)
at Npgsql.Internal.NpgsqlConnector.ReadMessageLong(Boolean async, DataRowLoadingMode dataRowLoadingMode, Boolean readingNotifications, Boolean isReadingPrependedMessage)
at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16 token)
at Npgsql.NpgsqlDataReader.NextResult(Boolean async, Boolean isConsuming, CancellationToken cancellationToken)
at Npgsql.NpgsqlDataReader.NextResult(Boolean async, Boolean isConsuming, CancellationToken cancellationToken)
at Npgsql.NpgsqlCommand.ExecuteReader(Boolean async, CommandBehavior behavior, CancellationToken cancellationToken)
at Npgsql.NpgsqlCommand.ExecuteReader(Boolean async, CommandBehavior behavior, CancellationToken cancellationToken)
at Marten.Internal.Sessions.AutoClosingLifetime.ExecuteReaderAsync(NpgsqlBatch batch, CancellationToken token)
at Marten.Internal.Sessions.AutoClosingLifetime.ExecuteReaderAsync(NpgsqlBatch batch, CancellationToken token)
at Marten.Linq.MartenLinqQueryProvider.ExecuteHandlerAsync[T](IQueryHandler`1 handler, CancellationToken token)
--- End of inner exception stack trace ---
at JasperFx.Core.Exceptions.ExceptionTransformExtensions.TransformAndThrow(IEnumerable`1 transforms, Exception ex)
at JasperFx.Core.Exceptions.ExceptionTransforms.TransformAndThrow(Exception ex)
at Marten.Exceptions.MartenExceptionTransformer.WrapAndThrow(Exception exception)
at Marten.Linq.MartenLinqQueryProvider.ExecuteHandlerAsync[T](IQueryHandler`1 handler, CancellationToken token)
at Marten.Linq.MartenLinqQueryable`1.ToListAsync[TResult](CancellationToken token)
at TimedHostedService.DoWork(CancellationToken ct) in C:\Users\erwin\RiderProjects\MartenDbRetry\MartenDbRetryConsole\Program.cs:line 133
Sorry, but to me it does look like the ResilliancePipeline isn't kicking in. I've tried to configure it to catch all exceptions and still it's not retrying. Also, if I use exactly the same polly configuration and wrap the ToListAsync() with it, then the retry policy does kick in so to me it looks like the policy is correct.
Now it's a bit strange.. in the past I've tested the retry policy by doing a table wide lock and then I did see the retry policy kick in. Perhaps the difference here is that the connection itself isn't responding? (I think this is also literally what's happening in azure, after a planned maintenance event. PgBouncer still has a number of connections that are no longer valid and only after trying them will they be removed).
Anyway, if there is anything you'd like me to do to help figure out what's wrong, please let me know. Otherwise I'll proceed with a workaround which is to wrap the calls to martendb in my own retry policy.
Thanks!
@Erwinvandervalk Looked at this again, traced through the code, and it's definitely being called through the ResiliencePipeline just like I expected it to be. I tried your reproduction, and I can see where it's happily going into the resilience pipeline, throwing an exception we expect to be caught, then doing nothing.
I think it's time to talk to the Polly team about this one.
I did some more tracing. Initially I was as confused as you were about why the retry wasn't being hit, but I think I've found something. I've created a PR with what I think is a fix, though I haven't been able to try it out myself.
Finally closed by #3384
Hi guys,
I've experienced some issues with using MartenDB in Azure (postgresdb flexible server). When a maintenance event happens, I'm getting a lot of transient errors in the logs that seem like they are not being picked up by the retry policy.
I've tried to simulate this issue below in a minimal repro. The code just starts to poll martendb in the background. Then, if I pause / stop my database (it's running locally in docker, so can pause / stop it from docker desktop), then the retry policy doesn't seem to be used. I can clearly see that the query is not retried.
What have I tried (without success):
Please let me know if you need more information or if there is something else I can do / try.