buybackoff / 1brc

1BRC in .NET among fastest on Linux
https://hotforknowledge.com/2024/01/13/1brc-in-dotnet-among-fastest-on-linux-my-optimization-journey/
MIT License
437 stars 43 forks source link

Add my 'Safe' (No Pointer) solution #11

Closed NimaAra closed 8 months ago

NimaAra commented 8 months ago

I have implemented my solution with a self imposed restriction to avoid Unsafe and stay managed only. It supports both \n as well as \r\n.

buybackoff commented 8 months ago

In your table, my results are as of which commit?

Also did you run your implementation on the extended 10K dataset?

If it can't handle that dataset it will be listed as such.

NimaAra commented 8 months ago

Your results are for 52e82ca.

Yes my solution can process the extended 10k file. FYI, your soultion when run as JIT throws an exception when I run it against the 10k on Windows. The commit I used to generate the extended file is: 32143b2.

The error is:

Fatal error. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
at System.SpanHelpers.NonPackedIndexOfValueType[[System.Byte, System.Private.Cor eLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.Sp anHelpers+DontNegate1[[System.Byte, System.Private.CoreLib, Version=8.0.0.0, Cultu re=neutral, PublicKeyToken=7cec85d7bea7798e]], System.Private.CoreLib, Version=8.0. 0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]](Byte ByRef, Byte, Int32) at System.MemoryExtensions.IndexOf[[System.Byte, System.Private.CoreLib, Version =8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]](System.ReadOnlySpan1< Byte>, Byte)
at _1brc.Utf8Span.IndexOf(UIntPtr, Byte)
at _1brc.App.ProcessChunk(_1brc.FixedDictionary2<_1brc.Utf8Span,_1brc.Summary>, _1brc.Utf8Span) at _1brc.App.ProcessChunk(Int64, UInt32) at _1brc.App.<Process>b__20_0(System.ValueTuple2<Int64,Int32>)
at System.Linq.Parallel.SelectQueryOperator2+SelectQueryOperatorResults[[System .ValueTuple2[[System.Int64, System.Private.CoreLib, Version=8.0.0.0, Culture=neutr al, PublicKeyToken=7cec85d7bea7798e],[System.Int32, System.Private.CoreLib, Version =8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]], System.Private.CoreLi b, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.__Can on, System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85 d7bea7798e]].GetElement(Int32)

buybackoff commented 8 months ago

Hm, I just copied the exact dataset from my bench Linux machine to my Windows workstation, and it works. If you generate the dataset on Windows it will have \r\n\ line endings, which are not valid. Unfortunately, everyone optimizes for such primitive specs.

NimaAra commented 8 months ago

They have updated their script so on Windows they line-delimit using \n instead of \r\n. The file I ran it on was indeed \n delimited.

buybackoff commented 8 months ago

So we have a heisenbug or something?

Are you in Western Europe? Sharing 17GB there should be quite feasible, even fast.

NimaAra commented 8 months ago

Actualy you are right, ignore that. I was pointing it to the wrong file! :-)

NimaAra commented 8 months ago

Interesting, on my broken laptop, my JIT (against the 10k on Windows) runs at 23s vs yours at 27s. I can't try the AOT on it for now but will try that on my Workstation tomorrow.

It will be interesting if you can run my solution on your linux box and compare numbers.

NimaAra commented 8 months ago

Okay here's what I can see on my Windows workstation (dual socket Xeon X5650 | 24 cores). The CPU is very old so no AVX/AVX2.

Yours (83dffa7)

Mode Default File 10k File
JIT 5.1 12.5
AOT 4.8 12.3

Mine (2f17f85)

Mode Default File 10k File
JIT 5.5 7.7
AOT 5.2 7.5

And on my laptop (i7-6600U | 4 cores) with AVX/AVX2 supported.

Yours Mode Default File 10k File
JIT 12.7 28.2
AOT 13.8 31.6
Mine Mode Default File 10k File
JIT 17.5 23.5
AOT 16.6 23.4

These are averaged across 3 runs.

buybackoff commented 8 months ago

image

dotnet build -c Release

image

image

The CPU is very old so no AVX/AVX2.

😨 In Europe such CPU would consume more electricity than produce useful results in $. I'm trying to get rid of an i7-2600 Optiplex, Windows Pro there is more valueable then the rest

Absence of AVX explains relative results.

NimaAra commented 8 months ago

Yeah I know it's old; I am building a new Ryzen 7950x but not there yet.

What do you get with AOT?

buybackoff commented 8 months ago

Slightly better image