Open htbmw opened 2 months ago
I am seeing this on DotPulsar 3.2.1 as well, so not specific to 3.3.1 as initially reported. This seems to be related to protobuf-net and some more information can be found here:
https://stackoverflow.com/a/17096460
Can someone please check what can be done inside DotPulsar to make it thread safe?
Hi @htbmw
Could you please provide more information on your .NET configuration:
Hi @entvex , thanks for your request for further details.
Which version of .NET is the code running on? .NET 8
What OS and version, and what distro if applicable? Dockerfile uses this base image to build the final runtime image: mcr.microsoft.com/dotnet/aspnet:8.0-jammy-amd64
What is the architecture (x64, x86, ARM, ARM64)? x64
Do you know whether it is specific to that configuration? Unfortunately I have no idea
If you're using Blazor, which web browser(s) do you see this issue in? Not using Blazor
Hi @htbmw
We have never seen this issue before but would like to help.
As stated in the StackOverflow post, using 'PrepareSerializer' might bring about another issue.
This seems to be an old issue so I guess no solution is coming from protobuf-net.
We could protect the 'ProtoBuf.Serializer.Serialize' call with a lock, but I think that will hurt performance.
If you can, could you create your own DotPulsar.dll after adding:
static Serializer() => Serialize(new BaseCommand());
to 'DotPulsar.Internal.Serializer'? I hope this call will force protobuf-net to create stuff needed for serializing the base command so that we don't see this issue. It's a long shot, but worth a try.
Hi @htbmw.
Can you please try and see if https://www.nuget.org/packages/DotPulsar/3.3.2-rc.1 fixes the issue ?
Hi @entvex , thanks I will give it a go and report back sometime this week.
Hi @blankensteiner, sorry for not replying sooner. I will give it a go if it is different from the fix that @entvex posted and asked me to test.
Appreciate everyone's help and suggestions so far!
Hi @htbmw It's the same fix :-)
Description
I am getting a ConsumerFaultedException when my application starts up and tries to create a consumer. The full message and stacktrace can be seen in the attached screenshot. This happens when calling the GetLastMessageIds on the consumer.
I have seen this on several occasions in production after we updated the DotPulsar package to 3.3.1. Cannot recall seeing it on 3.2.1 or earlier.
The application runs in a pod in K8s. I stop the application when errors like this happen after retrying for a number of times, and I've seen that at some point, after many startups (controlled by the k8s deployment), the application does not run into this exception and then can continue normally. But it happens after several restart attempts and crashloopbackoffs.
Reproduction Steps
I am not sure how this can be reproduced. Have not seen this on a local environment, only in K8s clusters in production and test environments. But I suspect this could be related to the 3.3.1 DotPulsar version, but cannot 100% confirm this.
Expected behavior
Since I am not explicitly in control of any serializers under the hood of DotPulsar, I expect the package to not run into the reported deadlock situation if that is the case.
Actual behavior
Low level exception with details about a potential deadlock issue that I cannot see myself being responsible for.
Regression?
Not sure but I suspect it is happening since version 3.3.1 of the DotPulsar package.
Known Workarounds
None that I am aware of.
Configuration
No response
Other information
No response