dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.91k stars 4.63k forks source link

Some functionalities are much slower on Linux compared to Windows #48116

Open CalinBunea opened 3 years ago

CalinBunea commented 3 years ago

Description

Hi, I'm working in BigData field and I have noticed that some functionalities (like String related methods) are extremely slow on .NET Core running on Linux Ubuntu 18.04 compared to Windows Server 2012. The performance difference is like 30X-50X. This issue is present in .NET Core 2.X, 3.X, 5.X. To me this is quite embarrassing and I am seriously thinking to quit working with .NET and start with other language after ~15 years of development in .NET... Is really unreal how Microsoft has released not one but 3 versions of .NET core with this issue and they are not capable of fixing it. I saw a similar complaint from another user and a guy from Microsoft has started to request more information about a very simple String.IndexOf performance complaint. Can't you guys test this out yourself? Is it so hard to write 20 lines of code to test this out? Anyway, I wrote the code for you and I will paste it bellow. I've also noticed that Guid.NewGuid() is also extremely slow on Linux compared to Windows. I have used VMs in Azure for my tests and I have picked exactly the same size for the VMs so the tests should be considered fair. Just run the code bellow on a Windows OS vs Linux. You can use 10 Million items for test but on Linux it will take forever to execute so you can start with ~2 Million first. Please spare me from questions like "have you tried with invariant culture or Ordinal compare etc?" Yes, I did tried with same results!

using System;
using System.Diagnostics;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;

namespace ConsoleApp3._1
{
    class Program
    {
        static async Task Main(string[] args)
        {
            Int32 size = -1;
            if(args.Length>0)
                int.TryParse(args[0],out size);
            if (size < 1)
            {
                Console.WriteLine("Please provide a size for test collection greater than 0");
                return;
            }
            String oneRandomStr = Guid.NewGuid().ToString();
            //generate random data to avoid CPU cache
            String[] randomData = new String[size];
            Int32 idx = -1;
            Stopwatch sw = Stopwatch.StartNew();
            await Task.WhenAll(Enumerable.Range(0, Environment.ProcessorCount).Select(x => Task.Run(() =>
            {
                var currentIdx = Interlocked.Increment(ref idx);
                while (currentIdx < randomData.Length)
                {
                    randomData[currentIdx] = $"{Guid.NewGuid()}{Guid.NewGuid()}";
                    currentIdx = Interlocked.Increment(ref idx);
                }
            })));
            sw.Stop();
            Console.WriteLine($"Random data populated in {sw.Elapsed}");
            sw.Restart();
            foreach (string str in randomData)
            {
                var unusedIdx = str.IndexOf(oneRandomStr, 2);
            }

            sw.Stop();
            Console.WriteLine($"Time spent with String IndexOf :{sw.Elapsed}");
        }
    }
}
perlun commented 3 years ago

To me this is quite embarrassing and I am seriously thinking to quit working with .NET and start with other language after ~15 years of development in .NET... Is really unreal how Microsoft has released not one but 3 versions of .NET core with this issue and they are not capable of fixing it.

I don't question the issue you are seeing, but I strongly suggest you work a bit on the attitude with with you approach the .NET team on this. (I am not a part of this team myself, just happened to bump into this issue.)

I am 100% that a lot of performance benchmarking has already taken place for .NET (Core), on all supported platforms.

Myself being completely unaware on how the GUID generation code in .NET on Linux works, but could one potential reason for this being that it uses /dev/random instead of /dev/urandom? (This article describes a bit of the differences in these: https://www.exoscale.com/syslog/random-numbers-generation-in-virtual-machines/)

Many applications do not generate a huge number of random numbers. It's not an entirely obvious decision to make - should random number generation use the very-high-quality RNG (optimal for cryptography and other use cases where uniqueness of the random numbers is a top priority) or should the random number generator use the slightly-lower-quality RNG in the kernel?

My key takeaway: it's probably simply not as "black and white" as you describe it.

CalinBunea commented 3 years ago

Work on my attitude you say... I've always defended .NET in debates with other people who claimed that Python is better for BigData what so ever. My philosophy was always that the language should not matter if you do your implementation correctly. But when such a basic functionality like string.IndexOf is about 50 times slower than it should be, on a framework which is developed by 4+ years now, I can no longer claim that language does not matter. Now let me get to the "real" problem. The real problem is that people (includin me) would spend hours of work trying to optimize their custom algorithms. We have a data processing job which starts 40 VMs with 8 CPU cores each. Those jobs do heavy rely on string processing. Can you imagine how much money are being lost due to a problem in the foundation blocks of .NET which you would never suspect to be problematic? I do not care how GUID generator was implemented in .NET for Linux, all I know is that the functionality is a simple one and work very fast on windows, in my opinion it should have been copied from windows implementation. Same applies for string related functions which are extremely slow on Linux. As a .NET developer, what should I do? Should I write the .NET core functionalities myself? Should I question all the time even if a for loop can have performance issues on Linux? Why would I use .NET at all then? And I won't change my attitude because in my 15 years of experience I've only got answers from Microsoft like "we don't think we should fix/implement that etc". In other words, people from Microsoft are not capable to envision the implications of a specific issues and therefore they would not prioritize the fix. The proof is this string.IndexOf method which is slow on all .NET core versions on Linux.

perlun commented 3 years ago

Work on my attitude you say...

Yes, please.

But when such a basic functionality like string.IndexOf is about 50 times slower than it should be, on a framework which is developed by 4+ years now, I can no longer claim that language does not matter.

I am not in any way saying that string.IndexOf() couldn't be optimized, but such an operation is always and will always be an O(n) operation as I'm sure you're aware of. Depending on what specific problem you are working on (I'm not an expert in Big Data in any way), there could be other more suitable tree-oriented data structures that could be used such as a trie for example. But this depends greatly on the business problem at hand.

Now let me get to the "real" problem. The real problem is that people (includin me) would spend hours of work trying to optimize their custom algorithms. We have a data processing job which starts 40 VMs with 8 CPU cores each. Those jobs do heavy rely on string processing. Can you imagine how much money are being lost due to a problem in the foundation blocks of .NET which you would never suspect to be problematic?

Quite a fair amount of money, I presume.

I do not care how GUID generator was implemented in .NET for Linux, all I know is that the functionality is a simple one and work very fast on windows, in my opinion it should have been copied from windows implementation.

But this is simply not possible. You need to realize that the kind of effort the .NET team has undertaken within the last few years (with a bit of help from the community) is quite a massive undertaking. There are dozens, if not hundreds of small choices like this that has to be made (use /dev/urandom or /dev/random? Use mmap or fread to read data from large files into memory? Etc...)

It's not in any way possible to just "copy it from the Windows implemetnation". A lot of managed code need to be mapped to their underlying OS primitives and many decisions like this has to be made.

Now, I do not claim that all these decisions during these 4 years have been perfect; there's probably quite a long way to go. Do remember that .NET on Linux is a much younger product per se than its Windows counterpart. But it's getting better all the time and I'm really happy about what we're already seeing.

Same applies for string related functions which are extremely slow on Linux.

Thanks for your example code. I tried it with 1 million iterations and here's what I get:

/usr/share/dotnet/dotnet /home/per/git/ConsoleApp1/ConsoleApp1/bin/Debug/netcoreapp3.1/ConsoleApp1.dll 1000000
Random data populated in 00:00:05.1393793
Time spent with String IndexOf :00:00:10.5430489

This is on a i5-8250U-based laptop (quad-core, but in no way a high-end machine). Are you seeing figures similar to these? What figures do you get on Windows with this?

As a .NET developer, what should I do? Should I write the .NET core functionalities myself? Should I question all the time even if a for loop can have performance issues on Linux? Why would I use .NET at all then?

I don't have any data to back this claim, but I'm quite certain that .NET will be faster than Python in most cases, including this particular example. Please give it a try, it would be interesting to see some perf. figures on this actually :+1:

Again, don't have any hard data backing this claim, but my rough guesstimate would be that C#/.NET on Linux performs roughly equivalent as Java, since they are very similar languages & runtimes (JIT-based, interpreting bytecode-based programs). Then again, Java has been running on Linux for many, many years more, so... maybe it's a lot faster. (I honestly don't know, this is just a hunch)

And I won't change my attitude because in my 15 years of experience I've only got answers from Microsoft like "we don't think we should fix/implement that etc". In other words, people from Microsoft are not capable to envision the implications of a specific issues and therefore they would not prioritize the fix. The proof is this string.IndexOf method which is slow on all .NET core versions on Linux.

Rest assured that Microsoft of 2021 is very different than Microsoft of 2001. I suggest we try to work together on first:

  1. Getting more hard data about the perceived problem at hand.
  2. Getting more hard data about how this compares to .NET on Windows, Python on Linux and a few other scenarios
  3. Cooperate with the team working on .NET to make improvements in the areas where it's needed. If there are obvious low-hanging fruit, I'm quite someone either from the Microsoft team or some outside collaborator will be happy to submit pull requests trying to fix some of these obvious shortcomings.
dotnet-issue-labeler[bot] commented 3 years ago

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

CalinBunea commented 3 years ago

Dude, are you just writing here to get attention or something? How can you possibly ask me what numbers I've got compared to yours when we use different hardware? Just spin a VM on your laptop with Windows or Linux (depending on your host OS) and do a fair comparison using same hardware! Or even better, spin 2 VMs one with Linux Ubuntu 18.04 and one with Windows Server so both VM will suffer any virtualization penalty. Now, about .NET core implementations...of course Microsoft cannot implement a file open ourside the OS functions, but we are talking here about String.IndexOf which by any means does not have to use any OS functions. Maybe it has to read one time only some information about the OS culture but that should be all. Rest of it should be algorithms and calling the right CPU instructions to gain best performance. Afterall this should be the biggest advantage of JIT - the fact that it should be able to compile IL into machine language using the current CPU architecture and available instructions set. .NET Core is not some 3rd party library which you can replace or implement it yourself so yes, I do have high expectations from a framework on which I will build my applications and I don't expect to have such silly problems after more than 4 years of development. I literally can't trust anymore .NET Core and don't ask me to write my own IndexOf method for my BigData problems. I've already wrote my own custom collections to handle better huge amount of data (that is another topic which I won't elaborate more here). What will be next? Should I write my own JIT or what? Maybe I should write the entire .NET Core myself... Really man, you are so superficial like many people working at Microsoft...

terrajobst commented 3 years ago

@CalinBunea

I have done zero research but I don't doubt that there are benchmarks where one platform performs better than the other on string processing. We have seen this in many areas. We generally care about .NET being awesome for workloads we think our customers care about and that very much includes big data processing or string processing in general.

We're quite public about how we approach these things and @perlun described it quite well: we start with scenarios and benchmarks around those and then we see where the bottlenecks are. As @perlun mentioned, the devil tends to be in the details and are often not related to particular algorithms but how our system overall interplays with the idiosyncrasies of the platform we're running on.

One thing I will say though is that you don't help your case here. You make a bunch of assertions about what we care about, make statements about our technical abilities, and overall question our motives. This is an open source project and as such it has a code of conduct. Your attitude isn't crossing the line yet, but you're not far off. We're more than willing to work with you on the issues you're raising, but we won't engage with you if your hostile tone doesn't change.

tannergooding commented 3 years ago

Here is what I get locally (AMD Ryzen 5950X):

Used code, `string.IndexOf(string, int)` - Click to expand! ```csharp using System; using System.Diagnostics; using System.Linq; using System.Runtime.InteropServices; using System.Threading; using System.Threading.Tasks; namespace ConsoleApp3._1 { class Program { static async Task Main(string[] args) { Int32 size = -1; if (args.Length > 0) int.TryParse(args[0], out size); if (size < 1) { Console.WriteLine("Please provide a size for test collection greater than 0"); return; } Console.WriteLine($"Environment.Is64BitProcess: {Environment.Is64BitProcess}"); Console.WriteLine($"Environment.ProcessorCount: {Environment.ProcessorCount}"); Console.WriteLine($"Environment.OSVersion: {Environment.OSVersion}"); Console.WriteLine($"RuntimeInformation.FrameworkDescription: {RuntimeInformation.FrameworkDescription}"); Console.WriteLine($"Size: {size}"); String oneRandomStr = Guid.NewGuid().ToString(); //generate random data to avoid CPU cache String[] randomData = new String[size]; Int32 idx = -1; Stopwatch sw = Stopwatch.StartNew(); await Task.WhenAll(Enumerable.Range(0, Environment.ProcessorCount).Select(x => Task.Run(() => { var currentIdx = Interlocked.Increment(ref idx); while (currentIdx < randomData.Length) { randomData[currentIdx] = $"{Guid.NewGuid()}{Guid.NewGuid()}"; currentIdx = Interlocked.Increment(ref idx); } }))); sw.Stop(); Console.WriteLine($"Random data populated in {sw.Elapsed}"); sw.Restart(); foreach (string str in randomData) { var unusedIdx = str.IndexOf(oneRandomStr, 2); } sw.Stop(); Console.WriteLine($"Time spent with String IndexOf :{sw.Elapsed}"); } } } ```
Environment.Is64BitProcess:                 True
Environment.ProcessorCount:                 32
Environment.OSVersion:                      Microsoft Windows NT 6.2.9200.0
RuntimeInformation.FrameworkDescription:    .NET Core 3.1.11
Size:                                       2000000
Random data populated in 00:00:00.5534036
Time spent with String IndexOf :00:00:00.4083975

Environment.Is64BitProcess:                 True
Environment.ProcessorCount:                 32
Environment.OSVersion:                      Unix 4.4.0.19041
RuntimeInformation.FrameworkDescription:    .NET Core 3.1.12
Size:                                       2000000
Random data populated in 00:00:02.7903183
Time spent with String IndexOf :00:00:10.5053876

Environment.Is64BitProcess:                 True
Environment.ProcessorCount:                 32
Environment.OSVersion:                      Microsoft Windows NT 10.0.19042.0
RuntimeInformation.FrameworkDescription:    .NET 5.0.2
Random data populated in 00:00:00.5211980
Time spent with String IndexOf :00:00:13.8662883

Environment.Is64BitProcess:                 True
Environment.ProcessorCount:                 32
Environment.OSVersion:                      Unix 4.4.0.19041
RuntimeInformation.FrameworkDescription:    .NET 5.0.3
Size:                                       2000000
Random data populated in 00:00:02.8335380
Time spent with String IndexOf :00:00:10.4657736
Used code `string.IndexOf(string, int, StringComparison)` - Click to expand! ```csharp using System; using System.Diagnostics; using System.Linq; using System.Runtime.InteropServices; using System.Threading; using System.Threading.Tasks; namespace ConsoleApp3._1 { class Program { static async Task Main(string[] args) { Int32 size = -1; if (args.Length > 0) int.TryParse(args[0], out size); if (size < 1) { Console.WriteLine("Please provide a size for test collection greater than 0"); return; } Console.WriteLine($"Environment.Is64BitProcess: {Environment.Is64BitProcess}"); Console.WriteLine($"Environment.ProcessorCount: {Environment.ProcessorCount}"); Console.WriteLine($"Environment.OSVersion: {Environment.OSVersion}"); Console.WriteLine($"RuntimeInformation.FrameworkDescription: {RuntimeInformation.FrameworkDescription}"); Console.WriteLine($"Size: {size}"); String oneRandomStr = Guid.NewGuid().ToString(); //generate random data to avoid CPU cache String[] randomData = new String[size]; Int32 idx = -1; Stopwatch sw = Stopwatch.StartNew(); await Task.WhenAll(Enumerable.Range(0, Environment.ProcessorCount).Select(x => Task.Run(() => { var currentIdx = Interlocked.Increment(ref idx); while (currentIdx < randomData.Length) { randomData[currentIdx] = $"{Guid.NewGuid()}{Guid.NewGuid()}"; currentIdx = Interlocked.Increment(ref idx); } }))); sw.Stop(); Console.WriteLine($"Random data populated in {sw.Elapsed}"); sw.Restart(); foreach (string str in randomData) { var unusedIdx = str.IndexOf(oneRandomStr, 2, StringComparison.Ordinal); } sw.Stop(); Console.WriteLine($"Time spent with String IndexOf :{sw.Elapsed}"); } } } ```
Environment.Is64BitProcess:                 True
Environment.ProcessorCount:                 32
Environment.OSVersion:                      Microsoft Windows NT 6.2.9200.0
RuntimeInformation.FrameworkDescription:    .NET Core 3.1.11
Size:                                       2000000
Random data populated in 00:00:00.5428705
Time spent with String IndexOf :00:00:00.1020913

Environment.Is64BitProcess:                 True
Environment.ProcessorCount:                 32
Environment.OSVersion:                      Unix 4.4.0.19041
RuntimeInformation.FrameworkDescription:    .NET Core 3.1.12
Size:                                       2000000
Random data populated in 00:00:02.8710434
Time spent with String IndexOf :00:00:00.1353504

Environment.Is64BitProcess:                 True
Environment.ProcessorCount:                 32
Environment.OSVersion:                      Microsoft Windows NT 10.0.19042.0
RuntimeInformation.FrameworkDescription:    .NET 5.0.2
Size:                                       2000000
Random data populated in 00:00:00.4901289
Time spent with String IndexOf :00:00:00.1145368

Environment.Is64BitProcess:                 True
Environment.ProcessorCount:                 32
Environment.OSVersion:                      Unix 4.4.0.19041
RuntimeInformation.FrameworkDescription:    .NET 5.0.3
Size:                                       2000000
Random data populated in 00:00:02.8322111
Time spent with String IndexOf :00:00:00.1501700
tannergooding commented 3 years ago

This was comparing Windows 10 19042.0 vs Ubuntu 20.04 running on WSL2 (I also tried on Ubuntu 20.04 proper and got essentially the same results).

The reason that Unix is slower for string.IndexOf(string, int) is because it is Culture aware and depends on ICU. This is likewise why this method is slower on .NET 5 than on .NET Core 3.1. The string.IndexOf(string, int, StringComparison.Ordinal) mode is effectively the same speed on both Windows and Linux as its not culture dependent and therefore not dependent on ICU.

CC. @tarekgh, @GrabYourPitchforks

stephentoub commented 3 years ago

@tannergooding, you might also try .NET 6 builds, which will include fixes like https://github.com/dotnet/runtime/pull/43065 (I don't think we backported that to 5.0.x, but I'm not 100% positive).

tarekgh commented 3 years ago

The GUID performance is already tracked by the issue #13628. And thanks @stephentoub pointing at the optimization we did for linguistic search.

SergiiKram commented 3 years ago

Just have done my own tests.

TLDR: performance is different because of different StringComparison type applied by default and does not vary with OS platform.

Long story: Tried locally from VS and from command line - got ~11 seconds (release, self-contained build). Then tried on a Linux VM on Azure - got ~14 seconds (Ubuntu, B1s size). Then tried on a Windows VM on Azure - got <1 second (Windows Server 2019 Datacenter, B1s size). Then explicitly set the comparison type to StringComparison.Ordinal - got <1 second on all environments.

CalinBunea commented 3 years ago

@SergiiKram, thank you for taking the time to perform benchmarks. Can you also confirm that you have used a random collection of strings, big enough to avoid CPU caching? Also I assume you haven't performed IndexOf on same String object over and over again... On my side I was running a very simple task which had to extract one single field value and store its MD5 hash from a file which contains JSON serialized objects on every line. So I was basically did a ReadLine then deserialize and use the field I needed. I was getting about 1.5 million items processed per second on a 8 core Azure VM (I always put performance counters in my code and show them every ~30 seconds) Since I had to process around 15 billion items I thought to optimize a bit and perform a simple String.IndexOf and Substring to extract my field value. I was very surprised when I saw a speed of only 80.000 items per second. So basically JSON deserialize was 10+ times faster than a basic IndexOf + Substring. Since the code was very simple, I've started to investigate deeper what is causing the performance issue and that is how I found out this problem...

SergiiKram commented 3 years ago

@CalinBunea yes, first I used your exact code with 1 000 000 items. Then I modified the code to make sure that there is a substring somewhere in each array item - same results. I think the key to solution here is to set explicitly StringComparison.Ordinal since other variants would give you overhead regardless of platform.