Open mwasson74 opened 10 months ago
Thanks for the dump file! I believe the following thread is the most interesting one. It holds a semaphore, so other worker threads are waiting on its completion. And if this thread stuck, then new background jobs will not be processed. And it is likely it's stuck.
I found the following issue on GitHub - https://github.com/dotnet/runtime/issues/70656 - with a similar stack trace happened in .NET 6.X and that issue states the issue was fixed in .NET 7.0. I see you are using an affected version, so perhaps the best recommendation I can give is to upgrade to a newer .NET version. Unfortunately, I also see https://github.com/dotnet/runtime/issues/83455, but looks like it was fixed in .NET 7.0.7 and 8.0.
Thread #41
OS Thread ID: 81092
AppDomain Address: 1776550875936
State: 176672
Managed stack trace:
- [InlinedCallFrame] (Interop+Winsock.recv) at System.Net.Sockets.dll
- [InlinedCallFrame] (Interop+Winsock.recv) at System.Net.Sockets.dll
- at
- System.Net.Sockets.Socket.Receive(System.Span`1<Byte>, System.Net.Sockets.SocketFlags, System.Net.Sockets.SocketError ByRef) at System.Net.Sockets.dll
- System.Net.Sockets.NetworkStream.Read(System.Span`1<Byte>) at System.Net.Sockets.dll
- System.Net.Security.SslStream+<EnsureFullTlsFrameAsync>d__186`1[[System.Net.Security.SyncReadWriteAdapter, System.Net.Security]].MoveNext() at System.Net.Security.dll
- at
- at
- System.Net.Security.SslStream+<ReadAsyncInternal>d__188`1[[System.Net.Security.SyncReadWriteAdapter, System.Net.Security]].MoveNext() at System.Net.Security.dll
- at
- System.Net.Security.SslStream.Read(Byte[], Int32, Int32) at System.Net.Security.dll
- MongoDB.Driver.Core.Misc.StreamExtensionMethods.ReadBytes(System.IO.Stream, Byte[], Int32, Int32, System.Threading.CancellationToken) at MongoDB.Driver.Core.dll
- MongoDB.Driver.Core.Connections.BinaryConnection.ReceiveBuffer(System.Threading.CancellationToken) at MongoDB.Driver.Core.dll
- MongoDB.Driver.Core.Connections.BinaryConnection.ReceiveBuffer(Int32, System.Threading.CancellationToken) at MongoDB.Driver.Core.dll
- MongoDB.Driver.Core.Connections.BinaryConnection.ReceiveMessage(Int32, MongoDB.Driver.Core.WireProtocol.Messages.Encoders.IMessageEncoderSelector, MongoDB.Driver.Core.WireProtocol.Messages.Encoders.MessageEncoderSettings, System.Threading.CancellationToken) at MongoDB.Driver.Core.dll
- MongoDB.Driver.Core.ConnectionPools.ExclusiveConnectionPool+PooledConnection.ReceiveMessage(Int32, MongoDB.Driver.Core.WireProtocol.Messages.Encoders.IMessageEncoderSelector, MongoDB.Driver.Core.WireProtocol.Messages.Encoders.MessageEncoderSettings, System.Threading.CancellationToken) at MongoDB.Driver.Core.dll
- MongoDB.Driver.Core.ConnectionPools.ExclusiveConnectionPool+AcquiredConnection.ReceiveMessage(Int32, MongoDB.Driver.Core.WireProtocol.Messages.Encoders.IMessageEncoderSelector, MongoDB.Driver.Core.WireProtocol.Messages.Encoders.MessageEncoderSettings, System.Threading.CancellationToken) at MongoDB.Driver.Core.dll
- MongoDB.Driver.Core.WireProtocol.CommandUsingCommandMessageWireProtocol`1[[System.__Canon, System.Private.CoreLib]].Execute(MongoDB.Driver.Core.Connections.IConnection, System.Threading.CancellationToken) at MongoDB.Driver.Core.dll
- MongoDB.Driver.Core.WireProtocol.CommandWireProtocol`1[[System.__Canon, System.Private.CoreLib]].Execute(MongoDB.Driver.Core.Connections.IConnection, System.Threading.CancellationToken) at MongoDB.Driver.Core.dll
@odinserj, thank you so much for getting back to me on this so quickly!! I have upgraded to .NET 8 just now and am about to deploy to see how it goes!! 🤞
It did not go well. Here is the stack trace from when it happened again:
ASP.NET Core .NET 8 Hangfire.AspNetCore Version="1.8.9" Hangfire.Console Version="1.4.2" Hangfire.Core Version="1.8.9" Hangfire.Dashboard.BasicAuthorization Version="1.0.2" Hangfire.Mongo Version="1.9.16"
Hm, so the main issue is that the number of enqueued metrics is inconsistent with the record themselves, e.g. it shows there are some jobs, but you don't see them?
That is, I assume, the symptom of the underlying issue. When this happens, the system thinks those jobs are still running and won’t enqueue them again. So in the instance from the screen shot, we now have 63 unique recurring jobs that never get enqueued again. The only way I can find to get them running again is to stop the app pool, drop all hangfire.* collections from mongo, and then start the app pool again. (we add the recurring jobs on startup)
In this case, I might be causing you to go in a wrong direction with that method and .NET upgrade, sorry for this.
I think it's better to raise an issue in the Hangfire.Mongo repository and describe the situation, because counters and actual contents should be consistent with each other.
I have the same issue with the SQL storage - there are always 10 jobs in the counter but nothing is enqueued.
.NET 4.6.1 Hangfire 1.8.6 Hangfire.Core 1.8.6 Hangfire.SqlServer 1.8.6.
@jonathancounihan
I am using Hangfire.Mongo and the owner said that he's found a bug in Hangfire.Mongo and he's pretty sure the same would happen with Sql Storage, too. https://github.com/gottscj/Hangfire.Mongo/issues/380#issuecomment-1925809164
I realize I'm not using the latest versions of things but these were the latest versions when I started having this issue in production. But due to stdump issue IndexOutOfRangeException - What am I doing wrong? I could not get a stack trace dump when I had the latest version of the packages.
ASP.NET Core .NET 6 Hangfire.AspNetCore" Version="1.8.6" Hangfire.Console" Version="1.4.2" Hangfire.Core" Version="1.8.6" Hangfire.Dashboard.BasicAuthorization" Version="1.0.2" Hangfire.Mongo" Version="1.9.12"
stdump_hangfire.txt
Classes have this attribute applied: SkipWhenPreviousJobIsRunningAttribute.txt
Execute Methods have
[DisableConcurrentExecution("{0}", 3)]
applied