dotnet / aspnetcore

ASP.NET Core is a cross-platform .NET framework for building modern cloud-based web applications on Windows, Mac, or Linux.
https://asp.net
MIT License
35.44k stars 10.02k forks source link

Uncatchable Exception While Using IFormFile Object Streams #43424

Closed QuinnDamerell closed 2 years ago

QuinnDamerell commented 2 years ago

Is there an existing issue for this?

Describe the bug

I have written an ML image classification server that takes very rapid requests via an API call and performs ML image classification on them using ML.Net. I have this system running in production, and it handles about 1 million predictions a day, but I'm randomly getting a really bad exception that I can't figure out.

I'm taking the Request.Form and getting an IFormFile object from the collection of files. Down a chain of async awaited calls, I'm eventually using that file to call OpenReadStream(), which I'm then passing to Image.FromStream(stream). What I'm seeing is about 5 times a day (out of the 1 million requests), Image.FromStream is causing an exception to be thrown, but the exception is surfacing in my web server in the AppDomain.CurrentDomain.UnhandledException global exception handler. I do have a generic try-catch around the entire Image.FromStream function, but it's not getting the exception.

Here's a sample exception with the full call stack printed, which is quite short: ArgumentOutOfRangeException - The Position must be within the length of the Stream: 55192 (Parameter 'value')
Actual value was 81813. - at Microsoft.AspNetCore.Http.ReferenceReadStream.set_Position(Int64 value)
at Microsoft.AspNetCore.Http.ReferenceReadStream.Seek(Int64 offset, SeekOrigin origin)

The biggest problem here is that since the exception is being captured in AppDomain.CurrentDomain.UnhandledException my server is then terminated and restarts.

I don't think this is a problem with Image.FromStream, as the call stack looks like some kind of seeking going on in the Request file body that's failing.

Expected Behavior

Using an IFormFile object and accessing its stream should never throw a global crashing exception even if the Request or stream is invalid.

Steps To Reproduce

I tried to pull the important bits out of the various parts of my projects, the gits of the call flow is this:

main()
{
   AppDomain.CurrentDomain.UnhandledException += CurrentDomain_UnhandledException;
}

private static void CurrentDomain_UnhandledException(object sender, UnhandledExceptionEventArgs e)
{
    Logger.Critical("Unhanded Program Exception!", (Exception)e.ExceptionObject);
}

[Route("api/thing/[controller]")]
[ApiController]
public class Inspect : ControllerBase
{
        [HttpPost]
        public async Task Post()
        {
                var form = Request.Form;
                foreach (var f in form.Files)
                  {
                      if(f.FileName.ToLower().EndsWith(".jpg"))
                      {
                             using(var s = f.OpenReadStream())
                            {
                                  await FunctionsThatCallOthersFunctions(s);
                            }
                      }
                  }               
 }

async Task EventuallyCalledFunction(Stream s)
{
     try
     {
              using (Image image = Image.FromStream(s)) // The exception is thrown to the global handler while execution is in here
              {
                        // Do ML prediction with image
              }
     }
     catch(Exception e)
     {
             // Never is hit
      }
}

Exceptions (if any)

ArgumentOutOfRangeException - The Position must be within the length of the Stream: 55192 (Parameter 'value')
Actual value was 81813. - at Microsoft.AspNetCore.Http.ReferenceReadStream.set_Position(Int64 value)
at Microsoft.AspNetCore.Http.ReferenceReadStream.Seek(Int64 offset, SeekOrigin origin)

.NET Version

6.0.400

Anything else?

I'm using dotnet 6.0.400 on a Ubuntu 20.04.4 LTS VM. I'm more than happy to help debug however possible.

Tratcher commented 2 years ago

That exception is coming from here: https://github.com/dotnet/aspnetcore/blob/b883b451b02876b9adaf3c4ce7ae0ac441f471bb/src/Http/Http/src/Internal/ReferenceReadStream.cs#L57-L60

It's trying to seek beyond the length of the file. Is there an outer exception here? Where is seek called? This seems like a bug in the caller, or that the image itself is corrupt and giving bad offsets.

QuinnDamerell commented 2 years ago

Thanks for that insight, I think there might be two possible issues here, from what I can gather.

1) Why is Image.FromSource trying to read outside the bounds of the actual stream length?

I think this could be an implementation issue on my end, or maybe I'm not using the form file correctly. I'm assuming the form file object will automatically handle chunked-based request body streaming, but maybe it's not? I have looked through the docs a lot but haven't found the answer yet.

2) Why isn't the exception being bubbled up to the callstack where Image.FromStream is being called?

This is the more interesting problem to me. I have a try/catch that handles Exception around the entire call, so there's no way the exception is leaking out. So why then is the exception ending up in the AppDomain.CurrentDomain.UnhandledException handler and killing my server?

Tratcher commented 2 years ago

I'm assuming the form file object will automatically handle chunked-based request body streaming, but maybe it's not?

Request.Form synchronously fully buffers the request body. Using Request.ReadFormAsync() is better async hygiene, but functionally equivalent. Both should handle consuming the request body however it's sent.

Find the full stack trace so we can narrow down where this is happeing.

2. Why isn't the exception being bubbled up to the callstack where Image.FromStream is being called? This is the more interesting problem to me. I have a try/catch that handles Exception around the entire call, so there's no way the exception is leaking out. So why then is the exception ending up in the AppDomain.CurrentDomain.UnhandledException handler and killing my server?

Your process is exiting? No exception in a controller should be able to cause that. Only exceptions on background threads should be able to. Do you have any async void calls in there?

QuinnDamerell commented 2 years ago

Great to know about Request.ReadFormAsync, thanks for that note!

Find the full stack trace so we can narrow down where this is happening.

Yeah, that's the strange part; the call stack I posted above is the entire thing. I even check the innerException to see if there are any more details and there are not.

Your process is exiting? No exception in a controller should be able to cause that. Only exceptions on background threads should be able to. Do you have any async void calls in there?

Right, the process is exciting because of the exception being thrown. Here's a quick look at the logic

   private void WorkerThread()
        {
            while(true)
            {
                try
                {
                    var mlContext = new MLContext();
                    var engine = GetPerdictionEngine(mlContext);

                    while(true)
                    {
                        var task = m_workQueue.Reader.ReadAsync().AsTask();
                        task.Wait();

                        var request = task.Result;

                        bool decodeSuccess = false;
                        try
                        {
                            using (Image image = Image.FromStream(request.Image))
                            {
                                decodeSuccess = true;
                                request.Output = engine.Predict(new GadgetModelInput()
                                {
                                    Image = (Bitmap)image
                                });
                            }
                        }
                        catch(Exception e)
                        {
                            if(!decodeSuccess)
                            {
                                request.ImageDecodeFailed = true;
                            }
                            else
                            {
                                request.Exception = e;
                            }
                        }    

                        request.Event.Release();                       
                    }
                }
                catch(Exception e)
                {
                        // Logic to log
                }
            }
        }

That's the actual code. My server is all async up to that point, but the ML.NET Predict function isn't async and takes about 60ms to run, so I implemented a thread system to make the calls.

What I do is I start 5 threads and have them all wait on a Channel object for work. When the API call is made, the Controller handler makes async calls down to this object, where it takes the Form.GetStream() and puts it in a context object as request.Image (in the logic above) and then awaits on it.

One of the helper threads will then pick up from the Channel queue and make all of the calls synchronously. Thus Image.FromStream and the Predict function are both called sync, so any exceptions should bubble up to them.

I am running this on Linux. I debugged into the Image logic from the windows side, which goes into Image.Windows.cs, but I'm unable to find the Linux side of things. I'm wondering if like you said, somewhere in Image.FromStream it's making an async call that's not be awaited?

Tratcher commented 2 years ago

I suggest trying to do the image processing inline in the controller to remove some of the threading complexity. ML.NET doesn't have to be async so long as it's actively using CPU. Async only helps when the flow gets blocked on something like IO.

QuinnDamerell commented 2 years ago

Yeah, that's a good thought. I was trying to do all the image decode and predictions on the thread, so I didn't tie up the Task thread pool with the long-running work. The channel system is also nice because it let's me control how many operations are running at once and how many calls are waiting for processing.

I will try t a few things. I can move the image decode to the async side to see if it helps. I also thought it would be interesting to try dumping the Form stream into a MemoryStream and then using it for the image decode. That's not as efficient, but it might fix it for now.

But those would all be workarounds; it would be interesting to figure out what's happening here.

ghost commented 2 years ago

Hi @QuinnDamerell. We have added the "Needs: Author Feedback" label to this issue, which indicates that we have an open question for you before we can take further action. This issue will be closed automatically in 7 days if we do not hear back from you by then - please feel free to re-open it if you come back to this issue after that time.

QuinnDamerell commented 2 years ago

@adityamandaleeka - I'm not sure what else is required. Can you clarify?

QuinnDamerell commented 2 years ago

To add a bit more, I think Tratcher and I have done a good job discussing the issue to get clarity and find a possible workaround, but given all of the data I have presented, we aren't any closer to finding a root cause.

Tratcher commented 2 years ago

At this point there's not enough for us to investigate. Let us know if you still see either part of this issue after you remove the worker thread logic, or after the MemoryStream workaround.

QuinnDamerell commented 2 years ago

@Tratcher I moved the image decode to the async part of the code (before any of the worker thread stuff) and I'm still seeing it repo. I also switched to Bitmap bp = new Bitmpa(ms) instead of Image.FromStream and it has the same crash. So the error must be with some common logic they share.

I'm trying to MemoryStream CopyTo now, to see if it works.

QuinnDamerell commented 2 years ago

Well, good and bad news.

The good news is that if I revert all of my changes to the original code but then add a copy to a MemoryStream before I call Image.FromStream(), there are no more server crashes. Some images fail to decode (and there always have been), but it's unclear if the files that would have crashed it previously are now showing up as decode errors. (The images I'm classifying are user images, so they can be anything)

The bad news then is that there seems to be something that doesn't work correctly between the HTTP Response Stream and Image.FromStream. Unforentaully I don't know any more ways to debug it and provide you with information, but if you have any ideas, I would be happy to try them.

Tratcher commented 2 years ago

The decode errors are suspicious. If you're reading files that contain invalid offsets this this error would be expected. Copying to a memory stream likely only results in a different kind of exception being thrown for the same condition, reading past the end.

Can you save some of the images that fail to decode and use them to reproduce this issue locally? Then you can look at it in the debugger to clarify if that's the same issue across your various mitigations.

QuinnDamerell commented 2 years ago

Yeah, I agree. I think the error is wrong with the image, but the fact that the Form File stream throws a global exception that kills the process while the MemoryStream doesn't is what I'm mostly concerned about.

I will try to capture some images and see if I can repo it. I will re-open the issue if I can.