dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.95k stars 4.65k forks source link

Performance issue with reading Requies.Form collection #28971

Closed mkArtakMSFT closed 4 years ago

mkArtakMSFT commented 5 years ago

From @yevgenyka on Thursday, 14 March 2019 15:29:24

Hi, We have a service, which runs kestrel. Our customers sometimes send big requests with 4MB data in the Request.Form collection. In that case, it takes more than 10 sec to read the form collection, using Request.ReadFormAsync method. When running kestrel on the service, based on dotnetframework, it may take more than 30 sec

We found, that this happens when the request contains a mix of Unicode characters, like text, translated to other languages (in our case there are many languages) with some quotes, "<>" characters or even "abc". The main problem seems to be in the fact, that ReadFormAsync uses Uri.UnescapeDataString method, which takes most of the time (see enclosed dotTrace snapshot) In our benchmarks (enclosed) we can see that Uri.UnescapeDataString takes about 39 sec, but WebUtility.UrlDecode takes 14 ms

Steps to reproduce the behavior:

  1. In enclosed solution restore packages
  2. Run with Ctrl+F5 (that will load the servers and perform benchmarks)
  3. See benchmarks results

Enclosed are demo solution to reproduce the issue, benchmarks results and dotTrace snapshot:

EscapeTest.zip readFormSnapshot.zip dotTrace BenchmarkDotNet.Artifacts.zip

Copied from original issue: aspnet/AspNetCore#8510

stephentoub commented 5 years ago

I'll take a look. I boiled the repro down to this:

using System;
using System.Diagnostics;
using System.Text;

class Program
{
    static void Main()
    {
        string input = $"param1={GenerateUrlEncoded(40)}&param2={GenerateUrlEncoded(220)}";
        Console.WriteLine("Input length: " + input.Length);
        var sw = Stopwatch.StartNew();
        string result = Uri.UnescapeDataString(input);
        Console.WriteLine("Result length: " + result.Length);
        Console.WriteLine(sw.Elapsed);
    }

    private  static string GenerateUrlEncoded(int rowsCount)
    {
        var sb = new StringBuilder();
        for (int i = 0x100; i < 0x999; i++)
        {
            sb.Append((char)i);
            if (i % 10 == 0) sb.Append('<');
            if (i % 20 == 0) sb.Append('>');
            if (i % 15 == 0) sb.Append('\"');
        }

        string escaped = Uri.EscapeDataString(sb.ToString());
        sb.Clear();
        for (int i = 0; i < rowsCount; i++)
        {
            sb.AppendLine(escaped);
        }

        return sb.ToString();
    }
}

which creates a 4,124,395 character input string to be unescaped, resulting in a 696,555 output string. On my machine that takes ~14s. Almost all of the time is spent allocating memory, as it ends up allocating ~77K char[]s equaling a mind-boggling ~630GB of allocation, as well as ~570K byte[] arrays. There's got to be some low-hanging fruit in there :)

cc: @rmkerr, @davidsh, @geoffkizer

yevgenyka commented 5 years ago

Hi, Thanks for the quick response and resolution, but for now, we have a production issue with Request.ReadFormAsync, as I wrote above and need some workaround. Please advise

jawn commented 5 years ago

Typo in title Requies -> Request

yevgenyka commented 5 years ago

Hi,

Thanks for the reply, unfortunately, the request limit approach can't be implemented in our case. Our customer may send entire encoded html page and it may be really big. That's the business requirement. To be more clear, we're running kestrel, on service which is compiled in .net framework 4.7.2. So, can you suggest another workaround?

10x