Closed mkArtakMSFT closed 4 years ago
I'll take a look. I boiled the repro down to this:
using System;
using System.Diagnostics;
using System.Text;
class Program
{
static void Main()
{
string input = $"param1={GenerateUrlEncoded(40)}¶m2={GenerateUrlEncoded(220)}";
Console.WriteLine("Input length: " + input.Length);
var sw = Stopwatch.StartNew();
string result = Uri.UnescapeDataString(input);
Console.WriteLine("Result length: " + result.Length);
Console.WriteLine(sw.Elapsed);
}
private static string GenerateUrlEncoded(int rowsCount)
{
var sb = new StringBuilder();
for (int i = 0x100; i < 0x999; i++)
{
sb.Append((char)i);
if (i % 10 == 0) sb.Append('<');
if (i % 20 == 0) sb.Append('>');
if (i % 15 == 0) sb.Append('\"');
}
string escaped = Uri.EscapeDataString(sb.ToString());
sb.Clear();
for (int i = 0; i < rowsCount; i++)
{
sb.AppendLine(escaped);
}
return sb.ToString();
}
}
which creates a 4,124,395 character input string to be unescaped, resulting in a 696,555 output string. On my machine that takes ~14s. Almost all of the time is spent allocating memory, as it ends up allocating ~77K char[]s equaling a mind-boggling ~630GB of allocation, as well as ~570K byte[] arrays. There's got to be some low-hanging fruit in there :)
cc: @rmkerr, @davidsh, @geoffkizer
Hi, Thanks for the quick response and resolution, but for now, we have a production issue with Request.ReadFormAsync, as I wrote above and need some workaround. Please advise
Typo in title Requies -> Request
Hi,
Thanks for the reply, unfortunately, the request limit approach can't be implemented in our case. Our customer may send entire encoded html page and it may be really big. That's the business requirement. To be more clear, we're running kestrel, on service which is compiled in .net framework 4.7.2. So, can you suggest another workaround?
10x
From @yevgenyka on Thursday, 14 March 2019 15:29:24
Hi, We have a service, which runs kestrel. Our customers sometimes send big requests with 4MB data in the Request.Form collection. In that case, it takes more than 10 sec to read the form collection, using Request.ReadFormAsync method. When running kestrel on the service, based on dotnetframework, it may take more than 30 sec
We found, that this happens when the request contains a mix of Unicode characters, like text, translated to other languages (in our case there are many languages) with some quotes, "<>" characters or even "abc". The main problem seems to be in the fact, that ReadFormAsync uses Uri.UnescapeDataString method, which takes most of the time (see enclosed dotTrace snapshot) In our benchmarks (enclosed) we can see that Uri.UnescapeDataString takes about 39 sec, but WebUtility.UrlDecode takes 14 ms
Steps to reproduce the behavior:
Enclosed are demo solution to reproduce the issue, benchmarks results and dotTrace snapshot:
EscapeTest.zip readFormSnapshot.zip BenchmarkDotNet.Artifacts.zip
Copied from original issue: aspnet/AspNetCore#8510