RickStrahl / WestWindWebSurge

Quick and easy URL and Load Testing for your Web applications on Windows
https://websurge.west-wind.com
387 stars 80 forks source link

Arithmetic operation resulted in an overflow. exceptions being thrown #76

Closed awcoats closed 3 years ago

awcoats commented 4 years ago

I keep getting this exception when my websurge sessions finishes..

I've not narrowed down exactly what causes it but I have been making the following changes to the default web surge session options

`2/5/2020 6:02:50 PM - Arithmetic operation resulted in an overflow.

System.Core at System.Linq.Enumerable.Sum(IEnumerable1 source) at WebSurge.ResultsParser.ParseResults(RequestWriter requestWriter, Int32 totalTimeSecs, Int32 threads) at WebSurge.ResultsParser.GetResultReport(RequestWriter requestWriter, Int32 totalTimeTakenMs, Int32 threadCount) at WebSurge.ResultsParser.GetResultReportHtml(RequestWriter writer, Int32 totalTimeTakenMs, Int32 threadCount) at WebSurge.StressTestForm.ParseResults(List1 results)

2/5/2020 7:03:02 PM - Arithmetic operation resulted in an overflow.

System.Core at System.Linq.Enumerable.Sum(IEnumerable1 source) at WebSurge.ResultsParser.ParseResults(RequestWriter requestWriter, Int32 totalTimeSecs, Int32 threads) at WebSurge.ResultsParser.GetResultReport(RequestWriter requestWriter, Int32 totalTimeTakenMs, Int32 threadCount) at WebSurge.ResultsParser.GetResultReportHtml(RequestWriter writer, Int32 totalTimeTakenMs, Int32 threadCount) at WebSurge.StressTestForm.ParseResults(List1 results)

2/10/2020 12:52:51 PM - Arithmetic operation resulted in an overflow.

System.Core at System.Linq.Enumerable.Sum(IEnumerable1 source) at WebSurge.ResultsParser.ParseResults(RequestWriter requestWriter, Int32 totalTimeSecs, Int32 threads) at WebSurge.ResultsParser.GetResultReport(RequestWriter requestWriter, Int32 totalTimeTakenMs, Int32 threadCount) at WebSurge.ResultsParser.GetResultReportHtml(RequestWriter writer, Int32 totalTimeTakenMs, Int32 threadCount) at WebSurge.StressTestForm.ParseResults(List1 results)

2/10/2020 3:50:58 PM - Arithmetic operation resulted in an overflow.

System.Core at System.Linq.Enumerable.Sum(IEnumerable1 source) at WebSurge.ResultsParser.ParseResults(RequestWriter requestWriter, Int32 totalTimeSecs, Int32 threads) at WebSurge.ResultsParser.GetResultReport(RequestWriter requestWriter, Int32 totalTimeTakenMs, Int32 threadCount) at WebSurge.ResultsParser.GetResultReportHtml(RequestWriter writer, Int32 totalTimeTakenMs, Int32 threadCount) at WebSurge.StressTestForm.ParseResults(List1 results)`

highdownts commented 3 years ago

I have been experiencing similar issues on a Windows 2019 server and tried it this evening on a Windows 10 development server. I am willing to generate a test project to recreate this issue if that will help. Just let me know. My case in a little different than the above description in that the settings mentioned do not affect the error one way or another.

What I have observed is that the error occurs when when the time is set above 7-10 seconds and there is a reasonably heavy load test ~200MBytes, ~6,800 request with 38,000 bytes/request avg. My Windows 2019 has 10Gb interfaces.

A few years back I used WebSurge to effectively benchmark data for 60 to well over 100 seconds. However, it now seems to be very sensitive to relatively short test intervals. I typically disable feedback, decompression, and everything else to reduce the post processing, but nothing I have tried seems to work.

It looks like a memory overflow issue when the accumulated data in WebSurge reaches a certain level. Is this expected? In other words, am I just pushing the product beyond its limits? When the error occurs, no summary data appears if you answer Yes to continue anyway, but the graph is available so that you can at least get a rough approximation of the performance.

Regards...

See Errors Below...

12/25/2020 3:55:19 PM - Arithmetic operation resulted in an overflow.

System.Core at System.Linq.Enumerable.Sum(IEnumerable1 source) at WebSurge.ResultsParser.ParseResults(RequestWriter requestWriter, Int32 totalTimeSecs, Int32 threads) at WebSurge.ResultsParser.GetResultReport(RequestWriter requestWriter, Int32 totalTimeTakenMs, Int32 threadCount) at WebSurge.ResultsParser.GetResultReportHtml(RequestWriter writer, Int32 totalTimeTakenMs, Int32 threadCount) at WebSurge.StressTestForm.ParseResults(List1 results)

12/25/2020 6:12:42 PM - Arithmetic operation resulted in an overflow.

System.Core at System.Linq.Enumerable.Sum(IEnumerable1 source) at WebSurge.ResultsParser.<>c__DisplayClass4_0.<UrlSummary>b__2(String key, IEnumerable1 uls) at System.Linq.Lookup2.<ApplyResultSelector>d__131.MoveNext() at System.Collections.Generic.List1..ctor(IEnumerable1 collection) at System.Linq.Enumerable.ToList[TSource](IEnumerable1 source) at WebSurge.ResultsParser.GetResultReport(RequestWriter requestWriter, Int32 totalTimeTakenMs, Int32 threadCount) at WebSurge.ResultsParser.GetResultReportHtml(RequestWriter writer, Int32 totalTimeTakenMs, Int32 threadCount) at WebSurge.StressTestForm.ParseResults(List1 results)

12/25/2020 6:14:56 PM - Arithmetic operation resulted in an overflow.

System.Core at System.Linq.Enumerable.Sum(IEnumerable1 source) at WebSurge.ResultsParser.<>c__DisplayClass4_0.<UrlSummary>b__2(String key, IEnumerable1 uls) at System.Linq.Lookup2.<ApplyResultSelector>d__131.MoveNext() at System.Collections.Generic.List1..ctor(IEnumerable1 collection) at System.Linq.Enumerable.ToList[TSource](IEnumerable1 source) at WebSurge.ResultsParser.GetResultReport(RequestWriter requestWriter, Int32 totalTimeTakenMs, Int32 threadCount) at WebSurge.ResultsParser.GetResultReportHtml(RequestWriter writer, Int32 totalTimeTakenMs, Int32 threadCount) at WebSurge.StressTestForm.ParseResults(List1 results)

12/25/2020 6:24:20 PM - Arithmetic operation resulted in an overflow.

System.Core at System.Linq.Enumerable.Sum(IEnumerable1 source) at WebSurge.ResultsParser.<>c__DisplayClass4_0.<UrlSummary>b__2(String key, IEnumerable1 uls) at System.Linq.Lookup2.<ApplyResultSelector>d__131.MoveNext() at System.Collections.Generic.List1..ctor(IEnumerable1 collection) at System.Linq.Enumerable.ToList[TSource](IEnumerable1 source) at WebSurge.ResultsParser.GetResultReport(RequestWriter requestWriter, Int32 totalTimeTakenMs, Int32 threadCount) at WebSurge.ResultsParser.GetResultReportHtml(RequestWriter writer, Int32 totalTimeTakenMs, Int32 threadCount) at WebSurge.StressTestForm.ParseResults(List1 results)

12/25/2020 6:28:43 PM - Arithmetic operation resulted in an overflow.

System.Core at System.Linq.Enumerable.Sum(IEnumerable1 source) at WebSurge.ResultsParser.<>c__DisplayClass4_0.<UrlSummary>b__2(String key, IEnumerable1 uls) at System.Linq.Lookup2.<ApplyResultSelector>d__131.MoveNext() at System.Collections.Generic.List1..ctor(IEnumerable1 collection) at System.Linq.Enumerable.ToList[TSource](IEnumerable1 source) at WebSurge.ResultsParser.GetResultReport(RequestWriter requestWriter, Int32 totalTimeTakenMs, Int32 threadCount) at WebSurge.ResultsParser.GetResultReportHtml(RequestWriter writer, Int32 totalTimeTakenMs, Int32 threadCount) at WebSurge.StressTestForm.ParseResults(List1 results)

12/25/2020 6:30:03 PM - Arithmetic operation resulted in an overflow.

System.Core at System.Linq.Enumerable.Sum(IEnumerable1 source) at WebSurge.ResultsParser.<>c__DisplayClass4_0.<UrlSummary>b__2(String key, IEnumerable1 uls) at System.Linq.Lookup2.<ApplyResultSelector>d__131.MoveNext() at System.Collections.Generic.List1..ctor(IEnumerable1 collection) at System.Linq.Enumerable.ToList[TSource](IEnumerable1 source) at WebSurge.ResultsParser.GetResultReport(RequestWriter requestWriter, Int32 totalTimeTakenMs, Int32 threadCount) at WebSurge.ResultsParser.GetResultReportHtml(RequestWriter writer, Int32 totalTimeTakenMs, Int32 threadCount) at WebSurge.StressTestForm.ParseResults(List1 results)

RickStrahl commented 3 years ago

WebSurge captures requests in memory only, so it will be limited to what fits into memory. You have to check in Process Explorer what actual memory usage is. But running requests for 60-100 seconds should not generate that much data as to overrun that and certainly the request count shouldn't overrun a 2gb integer limit.

200mb of pipe will translate into larger in memory footprint as there's additional data captured. You can try setting the max request size much smaller as you may not need to retrieve the entire response for requests.

I frequently run tests that run for hours without issues, but my requests are generally pretty small.

Any idea how many requests you have in your result set? If there's something reproducible in a test project I can take a look.

highdownts commented 3 years ago

I will do some additional testing, but I am pretty sure the task explorer memory usage for your app was well below the 2GB limit when the failure occurred. I ran the tests with the app settings to save the least amount of data. I will repeat and publish here. Thanks for your assistance. I am assuming a Visual Studio 2019 test project would be acceptable, if I can create a small constrained example that reproduces the issue.

highdownts commented 3 years ago

I have created a very small VS 2019 project that demonstrates this issue. Let me know where to send it to.

Regards, Warren

On Fri, Mar 12, 2021, 11:54 PM Rick Strahl @.***> wrote:

Reopened #76 https://github.com/RickStrahl/WestWindWebSurge/issues/76.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/RickStrahl/WestWindWebSurge/issues/76#event-4453456991, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABIRQGRFH6KPHLK4STWMM3LTDLVWVANCNFSM4KSXI3WQ .

highdownts commented 3 years ago

The following represent the test dataset.

Result Protocol Host URL Body

1 200 HTTP localhost:10202 /index.html 14,633 2 200 HTTP localhost:10202 /humphries-antiques.jpg 16,122 3 200 HTTP localhost:10202 /ford-tough.jpg 10,956 4 200 HTTP localhost:10202 /bundle.min.js 70,700 5 200 HTTP localhost:10202 /deferredBundle.min.js 126,776 6 200 HTTP localhost:10202 /instantssl-trust-seal-md-159x42.png 3,031 7 200 HTTP localhost:10202 /favicon.ico 9,812 Total 252,030 bytes zipped/super-compressed size

When the WebSurge test is run for 10 seconds with 10 threads, it passes.

When the test is run for 20 or more seconds with 10 threads, it fails.

The 20 second/10 thread test had ~104K requests with about 5.3K requests/second.

Memory usage was less than 500MB on machines with 16GB to 64GB.

Here is the error message:

4/30/2021 11:38:15 AM - Arithmetic operation resulted in an overflow.

System.Core at System.Linq.Enumerable.Sum(IEnumerable1 source) at WebSurge.ResultsParser.ParseResults(RequestWriter requestWriter, Int32 totalTimeSecs, Int32 threads) at WebSurge.ResultsParser.GetResultReport(RequestWriter requestWriter, Int32 totalTimeTakenMs, Int32 threadCount) at WebSurge.ResultsParser.GetResultReportHtml(RequestWriter writer, Int32 totalTimeTakenMs, Int32 threadCount) at WebSurge.StressTestForm.ParseResults(List1 results)

Let me know if you need further information.

Thanks...

RickStrahl commented 3 years ago

Thanks for the detail on this. I'll take a look when I get a chance which may not be right away.

I don't think this is a memory pressure issue - it's some sort of numeric overflow (likely an integer that needs a long perhaps). I've run tests that run for hours (admittedly with small content) and won't cause problems until you get into the 10gb memory range. I suspect this is something else.

highdownts commented 3 years ago

Do want the simple test project which reproduces this issue?

Also, have you looked at the issue with WebSurge not retaining its options on save? I only noticed this issue with the latest version, which I just updated to.

Thanks...

On Sat, May 1, 2021, 2:23 AM Rick Strahl @.***> wrote:

Thanks for the detail on this. I'll take a look when I get a chance.

I don't think this is a memory pressure issue - it's some sort of numeric overflow (likely an integer that needs a long perhaps). I've run tests that run for hours (admittedly with small content) and won't cause problems until you get into the 10gb memory range. I suspect this is something else.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/RickStrahl/WestWindWebSurge/issues/76#issuecomment-830562912, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABIRQGT374436YL3R6UIHS3TLOM6ZANCNFSM4KSXI3WQ .

highdownts commented 3 years ago

Issues with the WebSurge download from Github:

  1. Initially the project compiled, but failed at the following line of code in the SiteValidator.cs file.

    var text = http.DownloadString(url);

    The new creation of an HttpClient (http above) did not have access to all the members of the class. DownloadString was undefined.

    I repeatedly download and tested the code to make sure that it was not corrupted during the download.

    In Visual Studio, I used Nuget Package Manager to update the Westwind.Utilities assembly and a JSON assembly.

    After rebuilding the project, the above error went away.

  2. I was able to identify the failure mechanism that was causing the testing duration of the stress tester to be severely limited (about 10 seconds) for my scenarios. The error was caused by a type int overflow in the following code segments.

    Changed int to decimal:

    public decimal TotalBytesSent { get; set; }

    Changed int to decimal:

    public string ByteSizeString(decimal byteCount) {

    Affects these lines of code:

            sb.AppendLine($"     data sent: {result.ByteSizeString(result.TotalBytesSent)}");
            if (result.TotalBytesPosted > 0)
                sb.AppendLine($"   data posted: {result.ByteSizeString(result.TotalBytesPosted)}");

    The exception was caused by the line of code highlighted with 4 **** at the beginning and end of the line.

            res = new TestResult() {
            TotalRequests = results.Count,
            ThreadCount = threads,
            TimeTakenSecs = totalTimeSecs,
            FailedRequests = results.Count(req => req.IsError),
            SuccessRequests = results.Count(req => !req.IsError),
            RequestsPerSecond = ((decimal)results.Count / (decimal)totalTimeSecs),
            AvgRequestTimeMs = (decimal)results.Average(req => req.TimeTakenMs),
            MinRequestTimeMs = results.Min(req => req.TimeTakenMs),
            MaxRequestTimeMs = results.Max(req => req.TimeTakenMs),
       **** TotalBytesSent = results.Sum(req=> req.ResponseLength), ****
            TotalBytesPosted = results.Sum(req => string.IsNullOrEmpty(req.RequestContent) ? 0 : req.RequestContent.Length),
    
            ErrorMessages = results
                                .GroupBy(x => x.ErrorMessage)
                                .Where(g => g.Key != null)
                                .Select(g => new ErrorMessage() {
                                    Message = g.Key,
                                    Count = g.Count()
                                })

After correcting these issues, I ran multiple tests for 10 and 20 minutes without the overflow issue that I had reported. The “Yikes error” is gone.

Summary

The build/updates that I have been working with still have some issues. For 5 second, 5 thread tests using the released product vs build product, I noted that the custom build product consistently generated about 25% higher requests, but experiences about a .1 to .5 % failure rate. The released product does not seem to exhibit this failure flakiness. I tried reducing the threads to 1 on the custom build version, but the failure rate remained about the same. I am confident the failures are not real. I have done enough benchmarking in the past to have a high confidence that our server-side code is extremely robust and fast.

I know you may not be able to get to this for some time, but I wanted to pass on as much information as I could so that these issues will eventually get resolved.

I would also be happy to do extensive testing on any updated code base. If I have more time in the near term, I may try to dive a bit deeper into your product. One issue that I have is that some of your libraries are locked, so it makes debugging much more difficult.

Regards,

Warren

RickStrahl commented 3 years ago

Thanks for taking the time to take a look.

I changed the type to long not decimal since there should be no fractional values in this issue.

I suppose this makes sense - 2gb of total data collected is not a lot before int will overflow.

RickStrahl commented 3 years ago

Here's what I get from a logish run that's collecting 2gb worth of request data:

image

It should work now (in 1.21).

highdownts commented 3 years ago

Hi Rick,

The issue is still present. I traced the error to the HttpRequestData.cs file. I changed the property ResponseLength to a long to match your other changes.

public long ResponseLength { get; set; }

I missed this variable in my summary to you. I had changed it to type decimal in the previous version. Sorry for the oversight.

I just completed a 10 minute test with 4,582,016 requests at 7,637 req/sec. 0 failed and the total data downloaded was 163,643,428,572,428 bytes. :-)

BTW, I had no issues compiling the latest download and the compiled version did not exhibit the flaky error conditions that I had observed.

Many thanks for taking the time to address this issue.

Warren

On Sun, May 2, 2021 at 10:34 PM Rick Strahl @.***> wrote:

Thanks for taking the time to take a look.

I changed the type to long not decimal since there should be no fractional values in this issue.

I suppose this makes sense - 2gb of total data collected is not a lot before int will overflow.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/RickStrahl/WestWindWebSurge/issues/76#issuecomment-830992075, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABIRQGUQY424LH4NRHSEM4DTLYDUBANCNFSM4KSXI3WQ .

RickStrahl commented 3 years ago

It can't be the response length - that's an individual request and I doubt that's overflowing the int value of ~2gb. It's gotta be one of the calculated values because it doesn't blow up until the results are parsed and summarized. Hard to catch because the error triggers a LINQ expression that's lengthy. Checking.

later

Ok, so it is the ResponseLength and potentially ContentLength that are causing these problems. Turns out these values aren't overflowing, but the LINQ calculation is. Because these values are int Sum() aggregates them as int even though the sum can clearly be larger than what int can hold.

The workaround for that is to force the LINQ expression to long:

TotalBytesSent = results.Sum(req => (long) req.ResponseLength),
TotalBytesPosted = results.Sum(req => string.IsNullOrEmpty(req.RequestContent) ? 0 : (long) req.RequestContent.Length)

This is preferrable to changing the types to long (which would also work) as it increases the capture request size even a little and in the case of the second expression it's the only way to really do this as it's a computed value from the string length.

This is a bitch to test since I have to let the test run for a long time to get there :smile:

highdownts commented 3 years ago

It dies at TotalBytesSent = results.Sum(req=> req.ResponseLength) in the ResultaParser.cs file.

I am not sure how MS implements this functionality, but it does not like the int for the length term. Something like this would cause the issue.

long result = (for loop(int length += length))

The overflow must be occurring in the Sum, not the assignment to the long.

Would you like me to send you the test case? You can use it to easily create the issue.

On Mon, May 3, 2021, 5:08 PM Rick Strahl @.***> wrote:

It can't be the response length - that's an individual request and I doubt that's overflowing the int value of ~2gb. It's gotta be one of the calculated values because it doesn't blow up until the results are parsed and summarized. Hard to catch because the error triggers a LINQ expression that's lengthy. Checking.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/RickStrahl/WestWindWebSurge/issues/76#issuecomment-831535003, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABIRQGVDC7TVFKTDQ4KU4PDTL4GFPANCNFSM4KSXI3WQ .

RickStrahl commented 3 years ago

The problem is that while ResponseLength is an int the Sum() result may not be. So casting to (long) or changing the type to long fixes that so that the LINQ expression returns a long. There are two places where this is an issue.

Changing to ResponseLength to long fixes one of the problems. But the RequestLength is derived from string.Length and so that has to be cast explicitly or it doesn't work. Either way some hard code logic is required whcih sucks. Especially since I just found that there are two separate places where this logic is used (individual requests and for a full run) - I fixed the initial bug only to trigger on the second. IAC,I should have this fixed.

Thanks for the help and sticking with this. Easy to miss since it only fails if the test size gets largish and even then it may get missed if running with truncated test results as I typically do.

highdownts commented 3 years ago

Rick,

I just ran an experiment that replaced the exception area code with:

        long sum = 0;
        foreach (var item in results) {
            sum += item.ResponseLength;
        }

Before res = new TestResult() { ... TotalBytesSent = sum,// results.Sum(req => req.ResponseLength), ...

And, I changed ResponseLength back to int.

This passes without error, which makes sense.

It would appear the MS implements the results.Sum(req => req.ResponseLength) with a variable of type ResponseLength.

So, int sum = 0; foreach(var item in results){ sum += item.ResponseLength; }

This will overflow because the summing variable is of the same type as the length, which kind of makes sense. We shouldn't expect MS to replace int with long or decimal.

Regards,

Warren

On Mon, May 3, 2021 at 5:08 PM Rick Strahl @.***> wrote:

It can't be the response length - that's an individual request and I doubt that's overflowing the int value of ~2gb. It's gotta be one of the calculated values because it doesn't blow up until the results are parsed and summarized. Hard to catch because the error triggers a LINQ expression that's lengthy. Checking.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/RickStrahl/WestWindWebSurge/issues/76#issuecomment-831535003, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABIRQGVDC7TVFKTDQ4KU4PDTL4GFPANCNFSM4KSXI3WQ .

RickStrahl commented 3 years ago

Not sure what you're saying. The LINQ code fix as posted works. The problem with it is not that it doesn't work, it's that it's not obvious that's a potential overflow situation when the explicit (long) cast is required, so the code is unpredictable. It's fixed for the two scenarios where this applies in 1.27.2.

highdownts commented 3 years ago

Sorry for not clarifying myself. The LINQ code in the class initialization is elegant and it works, including the exception for the integer overflow. The breakage was the overflow caused by my scenarios.

What I had done was to replace the last overflow code with a non-LINQ replacement to prove to myself what was going on.

Thanks for fixing these issues for me and others. I have tested the latest WebSurge download on local and remote servers and all is working fine.

Regards, Warren

On Mon, May 3, 2021, 11:04 PM Rick Strahl @.***> wrote:

Not sure what you're saying. The LINQ code fix as posted works. The problem with it is not that it doesn't work, it's that it's not obvious that's a potential overflow situation, so the code is unpredictable. It's fixed for the two scenarios where this applies in 1.27.2.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/RickStrahl/WestWindWebSurge/issues/76#issuecomment-831656892, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABIRQGS5AL7YE5OUGRXNR4TTL5P4JANCNFSM4KSXI3WQ .