Open sliderhouserules opened 2 years ago
But on Windows presumably that can't happen? Since this code ran on Windows fine for years.
A 0 handle is invalid on Windows, so this was effectively a nop.
a warning if initializing a SafeHandle to a 0 value
It's valid to create such a SafeFileHandle, e.g. if someone wanted to use it to read from stdin.
Is it a rare enough use case that it is still worth doing a warning and having people who are doing it just add the warning ignore stuff around it to remove that?
I don't want to inconvenience people but this cost me months. I'm willing to accept that the real problem was PEBKAC, but surely I cannot be alone in experiencing this (not sure yet if @sliderhouserules has the same issue or not).
Why did Kestrel go silent and stop responding as a result (no input streams?)? Is there an improvement to be made there?
Is it a rare enough use case that it is still worth doing a warning and having people who are doing it just add the warning ignore stuff around it to remove that?
No: https://grep.app/search?q=new%20SafeFileHandle%28IntPtr.Zero
I'm not sure where I would put this... around every component that calls AWS? In my Startup? In my HttpClientFactory that the AWS clients use? Ugh. I don't think this will solve my Kestrel problems but I could be wrong. I'll see what I can sneak in to get this tested.
I'm not sure where I would put this
Where you would put what? @MMOSimca's app had bogus code that they deleted to solve the problem.
Oh, I thought they put it in to solve it, and the diff was just a different revision. I don't think this helps me much, as I wasn't using anything like this. Oh well.
It's possible that your Kestrel is also being similarly starved of input streams, though. I hadn't really thought of the problem in that way before. Maybe something is wrong in one of your Dispose functions too?
Sucks that it wasn't the same issue for you.
Do you have any ideas about how we could make the issue I encountered less likely to cause bad effects for users, @stephentoub? Or is this just a situation where doing something valid in some contexts turned disasterous in another and the results are inevitable.
It does seem kind of disingenuous that playing with a 'Safe'Handle was possibly the least safe thing that could happen, but you did mention this is very old legacy naming.
Or is this just a situation where doing something valid in some contexts turned disasterous in another and the results are inevitable.
It is dutifully doing what you told it to do. You wanted to protect the specified file descriptor, such that if you neglected to close it, it would be closed for you to prevent a leak, and such that if you were actively using it while someone else tried to close it, you'd be protected from it being closed out from under you. And you specified a valid file descriptor value. It just so happens that you provided it a file descriptor you didn't mean to.
At the very least, the changes made in the DLL provided by @tmds were fantastic and helped us narrow down the error much more closely. It would be great to add them to .NET.
I wonder if any of those socket error catches (I think he changed 3 places to expose stack trace) are unique to my problem. We could maybe warn people in the exception message that they might have terminated stdin?
We were having the exact same issues using Linux Fargate nodes on EKS - 502s, socket exceptions, corrupt responses. We managed to fix them by following the suggestion by MMOSimca to remove code which was creating and disposing SafeFileHandle(IntPtr.Zero)
instances.
I assume the code was added by someone copy-pasting the example code shown in the IDisposable documenatation.
The concerning thing is the effect this had. We were seeing some of our logs being written into the HTTP response stream from Kestrel. The log messages are supposed to be written into a local file. Here's an example response payload which contains a log message before the HTTP headers:
This hasn't happened since we made the code fixes. Not sure if this a sign of a deeper issue, or just our fault for misusing dangerous code... Anyway, big thanks to MMOSimca!
Wow, thanks for finding the source of the problem! (I'm 100% sure this is what we modeled this after, causing the issue.)
Based on my understanding of SafeFileHandle from this thread, those code examples for IDisposable seem shockingly disasterous. Why would regular IDisposable handlers have SafeFileHandle(0) calls?
It's especially concerning that the page doesn't even mention this at all. Or the fact that the code does nothing at all on Windows, which was presumably the primary OS when this was originally written.
Is it a rare enough use case that it is still worth doing a warning and having people who are doing it just add the warning ignore stuff around it to remove that?
No: https://grep.app/search?q=new%20SafeFileHandle%28IntPtr.Zero
Out of these 21 results, 8 of them seem to be instances of this bug based on that flawed IDisposable example code.
There also seems to be a link between flawed use of this code and the second parameter (the boolean). Everyone intentionally using it because they know what they're doing is setting that second parameter to false, despite what the docs say.
All the people copying from the IDisposable example are setting it to true. Could we warn on these criteria (first parameter = 0, second parameter = true)?
I could be wrong, but I feel like the docs are just using the SafeFileHandle as an example of how to dispose "a handle". You should never actually use new SafeFileHandle(IntPtr.Zero, true);
. It makes no sense - it's a no-op on Windows and triggers armageddon on Linux.
I feel like the docs are just using the SafeFileHandle as an example of how to dispose "a handle". You should never actually use new SafeFileHandle(IntPtr.Zero, true);. It makes no sense - it's a no-op on Windows and triggers armageddon on Linux.
Yes, the docs should be improved here. The intent I'm sure was to highlight how an IDisposable that owns a resource (a safe handle) can dispose of it. That should be made more clear, and an easy fix to the code (beyond comments that make it clear you shouldn't actually copy that SafeFileHandle into your own code) is to change the 0 to a -1.
The docs are open source. Contributions to improve them are welcome.
@stephentoub Is there anything further we should do about the fact that Kestrel was writing requests with the wrong output stream when all of its input streams were closed?
That sounds like a security concern. An extremely obscure security concern to be sure, but probably something that warrants its own thread, right?
Plus, now that we know how to trigger it, it should be very easy to replicate in a simple PoC.
I'd also like it if Kestrel had said -anything- about the situation, even when connection logging was enabled or debugging. Throwing an exception seems okay for Kestrel to do in a situation as bizarre as this, but smarter minds than I will probably know whether or not that could negatively impact some people who were like... trying intentionally to use Kestrel without stdin.
Is there anything further we should do about the fact that Kestrel was writing requests with the wrong output stream when all of its input streams were closed?
@halter73?
Tagging subscribers to this area: @dotnet/area-system-runtime See info in area-owners.md if you want to be subscribed.
Author: | sliderhouserules |
---|---|
Assignees: | - |
Labels: | `documentation`, `area-System.Runtime` |
Milestone: | 7.0.0 |
Also, is there a use case for SafeFileHandle(IntPtr.Zero, true)? You established there's one for SafeFileHandle(IntPtr.Zero, false).
Given how much the questionable documentation has spread around, it would be great to have a code solution. I'd be happy to write something, but it does sound like based on this conversation that it would just be rejected.
Is there anything further we should do about the fact that Kestrel was writing requests with the wrong output stream when all of its input streams were closed?
If it is caused by SafeFileHandle(IntPtr.Zero, true)
: no.
You musn't set ownsHandle
to true
for file descriptors you don't own.
Also, is there a use case for SafeFileHandle(IntPtr.Zero, true)? You established there's one for SafeFileHandle(IntPtr.Zero, false).
In practice, I don't think so.
Given how much the questionable documentation has spread around, it would be great to have a code solution.
You mean a better example for the IDisposable
documentation?
The handle = new SafeFileHandle(IntPtr.Zero, true)
is meant to represent an IDisposable
that gets disposed under if (disposing)
.
It could be replace by something else that is disposable. It doesn't even need to be a SafeHandle
. It could for example be fs = File.OpenWrite(Path.GetTempFileName())
.
Given how much the questionable documentation has spread around, it would be great to have a code solution.
You mean a better example for the
IDisposable
documentation?The
handle = new SafeFileHandle(IntPtr.Zero, true)
is meant to represent anIDisposable
that gets disposed underif (disposing)
. It could be replace by something else that is disposable. It doesn't even need to be aSafeHandle
. It could for example befs = File.OpenWrite(Path.GetTempFileName())
.
I mean in the constructor for SafeFileHandle, if the first parameter is 0 and the second parameter is true, throw an exception. Or can you do warnings like that (based on parameters)? I'm looking through private repositories I have access to and I see this in 1 other unrelated project, which is not great.
I agree with you that the documentation for IDisposable shouldn't mention SafeFileHandle at all. Very dangerous function calls don't have a place anywhere near a 'best practices how to implement IDisposable tutorial'. I see it copied around the internet to other places on how to implement the IDisposable pattern, and it's like they spread around a kind of knowledge infection where they are instead teaching how to ruin your program on Linux. The irony level for a code example teaching best practices is through the roof. I'm going around to a few of these places and pointing it out.
Considering that the 2nd example doesn't even contain a disposable object (it's just the code needed to dispose one if one did exist), I don't know that you need to have any representation of a disposable object at all rather than like a comment saying that this is where you'd place one. Do you have a preference? Is it standard to include a representative example line in docs like this or just use a comment (or nothing)?
Now that we've identified new SafeFileHandle(IntPtr.Zero, true);
as a source of this problem.
Is there still an open issue that needs to be investigated further? Or can this issue be closed?
The SafeFileHandle stuff is not my issue, no. We are using a third-party document processing engine that does use unmanaged code, and I do need to verify 100% that they are not actually doing this. But until then, we need to consider that my issue is caused by something else and I still need to do a thorough investigation. My time at work has not been able to put this up in the priority stack high enough to get to that just yet. I apologize. Can we please leave this issue open a bit longer?
Yes, we'll keep it open until you had some time to investigate and check whether the root cause is also the IntPtr.Zero SafeFileHandle.
The problem remains in .net6
@SpringHgui is it possible that your code or a library you rely on instantiates a new SafeFileHandle(IntPtr.Zero, true)
somewhere? For example see https://github.com/fabian-blum/AspNetCore.Identity.LiteDB/pull/14 and https://github.com/Marilyth/MopsBot-2.0/pull/74
The problem remains in .net8
do you have repro you can share @SpringHgui ? The linked repo above has way too much stuff. Alternatively can you do the strace
or get detailed diagnostic logs?
do you have repro you can share @SpringHgui ? The linked repo above has way too much stuff. Alternatively can you do the
strace
or get detailed diagnostic logs?
I encountered this error because I used the websocket-sharp
library, in this library I reported an error, the startup project was upgraded from .net6 to .net8, it still exists, and the specific location is not located at present, and the windwos server is replaced, and this problem no longer occurs
I encountered this socket error when working on a MAUI app. For me the error only happened on some old Android devices and not consistently. What seemed to fix it for me was to explicitly close my socket instead of relying on dispose being called for me.
Broken code
private static async Task<bool> ScanPort(string ipAddress, int port, TimeSpan timeout, CancellationToken token)
{
using var tcpClient = new TcpClient();
using var cts = CancellationTokenSource.CreateLinkedTokenSource(token);
cts.CancelAfter(timeout);
await tcpClient.ConnectAsync(ipAddress, port, cts.Token);
return tcpClient.Connected;
}
Working code
private static async Task<bool> ScanPort(string ipAddress, int port, TimeSpan timeout, CancellationToken token)
{
using var cts = CancellationTokenSource.CreateLinkedTokenSource(token);
cts.CancelAfter(timeout);
using var tcpClient = new TcpClient();
await tcpClient.ConnectAsync(ipAddress, port, cts.Token);
var didConnect = tcpClient.Connected;
tcpClient.Close();
return didConnect;
}
interesting. how many destinations are you trying to scan @icefire1 ? I'm wondering if you run out of max open file descriptors in this case...
@wfurt difficult to say exactly as it happened inconsistently, but I would guess somewhere around 10-40 scans over a duration of a few seconds to a few minutes.
Perhaps it's because too many sockets where opened at once. I would add, however, that I cannot replicate it by simply opening a lot of socket like so:
var clients = new List<TcpClient>();
for (int i = 0; i < 100000; i++)
{
clients.Add(new TcpClient());
}
Above code would indeed give me an exception, but it would be ErrorCode 23 instead. With a message "File table overflow". I tested the sample code on both a new and an old device (which gave me the unknown socket error).
Makes me wonder if it is related to the CancellationTokenSource.CreateLinkedTokenSource
and cancellation in general.
Description
We have an application based on template dotnet create webapi project template and just adds an IHostedService to do background processing. Kestrel is used in order to be able to answer health checks from Kubernetes, as well as feed new requests into the service to be queued. Our background processing logic involves pretty heavy interaction with AWS resources (SQS, S3, STS), and we are encountering SocketExceptions that have been very difficult to pin down.
The canon guidance to use a static HttpClient instance application-wide does not work well when using the AWS clients in Amazon's SDK packages. The various AWS client constructors, and the config objects you can hand in, do not accept an instance of HttpClient. Instead, you must either rely on their built-in caching of HttpClient instance(s) that are created internally, or you have to provide an implementation that derives from their HttpClientFactory. This is fairly straight-forward if you use IHttpClientFactory in .Net. The Create() method override that you implement simply calls .Create() on an instance of IHttpClientFactory provided via DI.
However, even when providing this HttpClientFactory to all instances of any AWS client that is instantiated anywhere and everywhere in this service, it does not solve the SocketException problem. I will provide the stack trace below, but the properties of the actual SocketException object are extremely unhelpful:
Unknown socket error; ErrorCode: -131074; SocketErrorCode: SocketError; NativeErrorCode: -131074
The twist here is that these errors have only been encountered on Linux/in containers. Extended running tests on Windows cannot reproduce the problem when tested under load.
The other twist is that our Kestrel calls are producing
System.InvalidOperationException: Handle is already used by another Socket.
errors (they are logged as Warning level) in Kestrel's handling pipeline. It would appear that Kestrel, and the HttpWebRequest pool used underneath IHttpClientFactory are stomping on each other. If I am mischaracterizing the issue and there is a different cause, please let me know.The fact that the runtime is throwing a SocketException with an error code that isn't even in the spec for TCP errors (nice article here) is the reason I am bringing it to this group. The runtime is burping up what would appear to be the equivalent of the
default
case in aswitch
statement.Any help or guidance here is appreciated.
Reproduction Steps
The basic structure is a Web API project created with dotnet create webapi with an
IHostedService
doing background processing interacting heavily with AWS resources. I can provide application code privately, as needed, so as to not have to share proprietary/private logic and access keys publicly.Expected behavior
Interaction with AWS resources from a web application should not cause SocketExceptions.
Actual behavior
We are encountering SocketExceptions that have error codes that aren't even part of the TCP specs:
Unknown socket error; ErrorCode: -131074; SocketErrorCode: SocketError; NativeErrorCode: -131074
Full stack trace:
Regression?
The application in its current form is a .Net 6 solution. In a prior version, it was .Net 5 and the SocketExceptions were very rare if not non-existent.
However, even in the .Net 5 solution Kestrel was throwing the
System.InvalidOperationException: Handle is already used by another Socket.
warnings when handling basic health check calls.Known Workarounds
We have implemented heavy retry policies using Polly, which allows the application to move past the errors, but it quite often takes a near complete restart of the background processing logic to clear up the SocketExceptions and allow the application to restart a job and create new connections to the AWS resources being used.
Configuration
The errors only appear running in containers on Linux.
Other information
N/A