BcryptNet / bcrypt.net

BCrypt.Net - Bringing updates to the original bcrypt package
MIT License
836 stars 99 forks source link

Reduce memory usage #54

Closed jvandertil closed 4 years ago

jvandertil commented 4 years ago

I was looking through the source code of the library, and noticed that there is quite some room to reduce the amount of memory allocated for certain operations.

I ran benchmarks (source code attached) for the (I think) most commonly used public methods, and came up with these results.

Obviously a large part of the runtime and memory usage is inherent to the algorithm itself, but I do believe that some optimizations might be useful to reduce unnecessary allocations.

Things I noticed at first glance:

Are you interested in taking PR's for this?


BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18363
Intel Core i7-3632QM CPU 2.20GHz (Ivy Bridge), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.1.101
  [Host]     : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
  DefaultJob : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
Method text hash value Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
GenerateSalt ? ? ? 748.4 ns 5.92 ns 4.95 ns 0.2155 - - 680 B
PasswordNeedsRehash ? ? ? 3,395.1 ns 37.28 ns 31.13 ns 0.4196 - - 1328 B
VerifyPassword **** $2a$0(...)eX1s. [60] ? 7,105,149.9 ns 43,366.64 ns 36,213.11 ns - - - 10064 B
HashPassword ? ? **** 7,015,698.7 ns 48,091.09 ns 44,984.44 ns - - - 10405 B
VerifyPassword abcde(...)vwxyz [26] $2a$0(...)QhstC [60] ? 7,090,475.1 ns 18,558.79 ns 15,497.43 ns - - - 10174 B
HashPassword ? ? abcde(...)vwxyz [26] 7,154,595.2 ns 85,278.99 ns 79,770.02 ns - - - 10515 B
    [MemoryDiagnoser]
    public class Benchmarks
    {
        [Benchmark]
        public string GenerateSalt()
            => BCrypt.GenerateSalt(10);

        [Benchmark]
        public bool PasswordNeedsRehash()
            => BCrypt.PasswordNeedsRehash("$2a$06$DCq7YPn5Rq63x1Lad4cll.TV4S6ytwfsfvkgY8jIucDrjc8deX1s.", 10);

        [Benchmark]
        [Arguments("")]
        [Arguments("abcdefghijklmnopqrstuvwxyz")]
        public string HashPassword(string value)
            => BCrypt.HashPassword(value, "$2a$06$DCq7YPn5Rq63x1Lad4cll.", true, HashType.SHA384);

        [Benchmark]
        [Arguments("", "$2a$06$DCq7YPn5Rq63x1Lad4cll.TV4S6ytwfsfvkgY8jIucDrjc8deX1s.")]
        [Arguments("abcdefghijklmnopqrstuvwxyz", "$2a$06$.rCVZVOThsIa97pEDOxvGuRRgzG64bvtJ0938xuqzv18d3ZpQhstC")]
        public bool VerifyPassword(string text, string hash)
            => BCrypt.Verify(text, hash);
    }

Update: Aligned benchmarks to all use workfactor 6 hashes

ChrisMcKee commented 4 years ago

You peaked my interest as I'd totally blocked out the stringbuilder usage. Of course some of the codes simply a remanent of it being written for .net 2, and what was 'best' then isn't really best now.

As for spans I'd purposely avoided them due to the support for this going way back and the expectation that if would be actively replaced within the framework (free-wins) without having to alter the code / add ifdefs. The diff in perf for adding the extra complexity would have to be meaningful to say the least.

I'd happily take anything around Interrogate hash and PasswordNeedsRehash (see other ticket), especially if they make it more useful / safer / better perf as they're mostly utility methods.

Base image

Sized StringBuilder image

String interpolation (string format in sheeps clothing) image

Using length in EncodeB64 string builder to preallocate (wonky unix version) image

image

Setting DecodeBase64 to use the max bytes as the stringbuilder length. has no noticeable difference

I thought strings create would have more of an impact but tbh not a lot to see here; Or I'm using it wrong; which is entirely likely considering my exposure to Spans been relatively limited. image

jvandertil commented 4 years ago

I have a couple things laying around, should I send the PR's to the perf branch?

I'll definitely take a look at the InterrogateHash method, shouldn't be too hard to improve that. It's not clear to me that the API is settled in the other ticket, is that correct?

ChrisMcKee commented 4 years ago

@jvandertil sure fire away. I wasn't 100% sure the api in the other ticket was necessarily something that belonged in the library. But making it easier to implement / a better method probably wouldn't go amiss either. The ultimate conflict between trying to keep the api simple and keeping people happy 😆

ChrisMcKee commented 4 years ago

HashParser Regex to non-regex


BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18363
Intel Core i7-6800K CPU 3.40GHz (Skylake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=3.1.101
  [Host]     : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
  DefaultJob : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
Method hash Mean Error StdDev Ratio Rank Gen 0 Gen 1 Gen 2 Allocated
InterrogateHashUsingRegex $2a$1(...)Qk7dq [60] 2,419.3 ns 48.15 ns 88.05 ns 1.00 2 0.1678 - - 1328 B
InterrogateHashUsingParser $2a$1(...)Qk7dq [60] 296.0 ns 6.53 ns 11.09 ns 0.12 1 0.0353 - - 280 B
InterrogateHashUsingRegex $2a$1(...)QPlxO [60] 2,337.5 ns 46.22 ns 73.31 ns 1.00 2 0.1678 - - 1328 B
InterrogateHashUsingParser $2a$1(...)QPlxO [60] 292.9 ns 5.74 ns 8.94 ns 0.13 1 0.0353 - - 280 B

🎈🎆🎇🎈🎉

ChrisMcKee commented 4 years ago

Base64 Decoding; unsurprisingly not allocating it to string is a winner here.


BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18363
Intel Core i7-6800K CPU 3.40GHz (Skylake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=3.1.101
  [Host]     : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
  DefaultJob : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
Method salt Mean Error StdDev Ratio RatioSD Rank Gen 0 Gen 1 Gen 2 Allocated
DecodeBase64StandardUnSized DCq7Y(...)4cll. [22] 160.47 ns 3.223 ns 6.287 ns 1.00 0.00 3 0.0184 - - 144 B
DecodeBase64StandardSized DCq7Y(...)4cll. [22] 165.65 ns 3.333 ns 5.476 ns 1.03 0.05 4 0.0184 - - 144 B
DecodeBase64StringCreateSpan DCq7Y(...)4cll. [22] 115.95 ns 2.343 ns 3.284 ns 0.73 0.03 2 0.0296 - - 232 B
DecodeBase64ToBytes DCq7Y(...)4cll. [22] 73.71 ns 1.502 ns 2.966 ns 0.46 0.02 1 0.0050 - - 40 B
DecodeBase64StandardUnSized HqWuK(...)Lrgb. [22] 164.26 ns 3.282 ns 5.012 ns 1.00 0.00 3 0.0184 - - 144 B
DecodeBase64StandardSized HqWuK(...)Lrgb. [22] 163.55 ns 3.252 ns 6.342 ns 0.99 0.05 3 0.0184 - - 144 B
DecodeBase64StringCreateSpan HqWuK(...)Lrgb. [22] 114.45 ns 2.301 ns 5.051 ns 0.70 0.04 2 0.0296 - - 232 B
DecodeBase64ToBytes HqWuK(...)Lrgb. [22] 74.16 ns 1.503 ns 2.748 ns 0.45 0.02 1 0.0050 - - 40 B
ChrisMcKee commented 4 years ago

Base64Encoding as bytes


BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18363
Intel Core i7-6800K CPU 3.40GHz (Skylake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=3.1.101
  [Host]     : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
  DefaultJob : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
Method Mean Error StdDev Ratio RatioSD Rank Gen 0 Gen 1 Gen 2 Allocated
EncodeBase64Unsized 137.70 ns 2.704 ns 4.444 ns 1.00 0.00 2 0.0355 - - 280 B
EncodeBase64Sized 149.75 ns 3.023 ns 5.967 ns 1.09 0.06 3 0.0355 - - 280 B
EncodeBase64AsBytes 51.95 ns 1.050 ns 1.635 ns 0.38 0.01 1 0.0092 - - 72 B

Surprisingly pre-sizing had a negative effect; cutting out the extra allocations is a win perf wise.

ChrisMcKee commented 4 years ago

String Allocation time


BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18363
Intel Core i7-6800K CPU 3.40GHz (Skylake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=3.1.101
  [Host]     : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
  DefaultJob : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
Method Categories Mean Error StdDev Ratio RatioSD Rank Gen 0 Gen 1 Gen 2 Allocated
Original_StrBuilder_SinEncoding StringAppend,AppendString 348.9 ns 6.91 ns 10.95 ns 1.00 0.00 3 0.0648 - - 512 B
Original_StrBuilder_SinEncoding_AppendChar StringAppend,AppendChar 347.9 ns 6.96 ns 11.63 ns 1.00 0.04 3 0.0648 - - 512 B
Original_StrBuilder_SinEncoding_AppendChar_Sized StringAppend,AppendChar 173.3 ns 3.36 ns 4.72 ns 0.50 0.02 2 0.0458 - - 360 B
Original_StrBuilder_SinEncoding_AppendChar_Sized_PRFmt StringAppend,AppendChar 115.4 ns 1.97 ns 2.75 ns 0.33 0.01 1 0.0468 - - 368 B
Original_StrBuilder_SinEncoding_AppendChar_Sized_FROMSTRING_PRFmt StringAppend,AppendString 117.4 ns 2.37 ns 3.90 ns 0.34 0.02 1 0.0468 - - 368 B
StringInterpolation_WithChar StringFmt,AppendChar 384.5 ns 7.46 ns 9.96 ns 1.11 0.04 4 0.0210 - - 168 B
StringInterpolation_WithString StringFmt,AppendString 354.8 ns 7.06 ns 7.55 ns 1.02 0.05 3 0.0281 - - 224 B

Bit of a mixed bag here; speed wise the original (master) code, sized and appending char was slightly faster than the change in the PR. (the change in encoding to char being the collective winner to enhancing this methods allocations)

Allocation wise the string.format using char was the winner.

jvandertil commented 4 years ago

I am not sure if this code does what you expect it to do:

        [Benchmark]
        [BenchmarkCategory("StringFmt", "AppendChar")]

        public void StringInterpolation_WithChar()
        {
            var res = $"$2{bcryptMinorRevision}${workFactor:00}${EncodedSaltAsChars}{EncodedHashAsChars}";
        }

The char[] would be interpolated as "System.Char[]".

jvandertil commented 4 years ago

As an aside, the benchmark functions should return the generated value, so in the example above:

[Benchmark]
[BenchmarkCategory("StringFmt", "AppendChar")]
public string StringInterpolation_WithChar()
{
    return $"$2{bcryptMinorRevision}${workFactor:00}${EncodedSaltAsChars}{EncodedHashAsChars}";
}

It looks like all the benchmarks should return the same value, if so: you could put a [ReturnValueValidator(failOnError: true)] attribute on the class so that the benchmarks fail if they do not.

ChrisMcKee commented 4 years ago

Can't allocate memory if you never use the char[] 😆

New string'd it and renamed something and copy pasted the string over the top 😆

Just RTFM'ing the BM docs, its changed a bit in the last ~2 years; the error things handy.

ChrisMcKee commented 4 years ago

BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18363
Intel Core i7-6800K CPU 3.40GHz (Skylake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=3.1.101
  [Host]     : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
  DefaultJob : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
Method Categories Mean Error StdDev Ratio RatioSD Rank Gen 0 Gen 1 Gen 2 Allocated
Original_StrBuilder_SinEncoding StringAppend,AppendString 343.8 ns 5.32 ns 4.72 ns 1.00 0.00 3 0.0648 - - 512 B
Original_StrBuilder_SinEncoding_AppendChar StringAppend,AppendChar 352.0 ns 7.06 ns 9.42 ns 1.02 0.03 3 0.0648 - - 512 B
Original_StrBuilder_SinEncoding_AppendChar_Sized StringAppend,AppendChar 287.7 ns 5.76 ns 10.38 ns 0.83 0.03 2 0.0458 - - 360 B
Original_StrBuilder_SinEncoding_AppendChar_Sized_PRFmt StringAppend,AppendChar 120.1 ns 2.45 ns 4.71 ns 0.35 0.02 1 0.0467 - - 368 B
Original_StrBuilder_SinEncoding_AppendChar_Sized_FROMSTRING_PRFmt StringAppend,AppendString 117.2 ns 2.38 ns 3.63 ns 0.34 0.01 1 0.0467 - - 368 B
StringInterpolation_WithChar StringFmt,AppendChar 384.3 ns 7.50 ns 11.45 ns 1.11 0.03 4 0.0486 - - 384 B
StringInterpolation_WithString StringFmt,AppendString 341.9 ns 6.48 ns 6.65 ns 0.99 0.02 3 0.0281 - - 224 B

Makes more sense; more so when you look at what .net is doing

        public StringBuilder Append(char[]? value)
        {
            if (value?.Length > 0)
            {
                unsafe
                {
                    fixed (char* valueChars = &value[0])
                    {
                        Append(valueChars, value.Length);
                    }
                }
            }
            return this;
        }

        public StringBuilder Append(ReadOnlySpan<char> value)
        {
            if (value.Length > 0)
            {
                unsafe
                {
                    fixed (char* valueChars = &MemoryMarshal.GetReference(value))
                    {
                        Append(valueChars, value.Length);
                    }
                }
            }
            return this;
        }

^ noted they've got span in there as well now; the ticket took that long to make it I wasnt sure it was in 3.1

jvandertil commented 4 years ago

Yeah, I was experimenting a bit with the Base64Encoder, letting it write directly into a Span<char> that was passed in, which was stack allocated and then put in a string with new string(Span<char>) 😎 . The only heap allocation that GenerateSalt had after that was the string it had to return. The lowest it can go is 80 bytes (29 chars * 2 bytes (padded to 64 bytes for memory alignment) + 16 bytes object header).

Looks like this:

        public static string GenerateSalt(int workFactor, char bcryptMinorRevision = DefaultHashVersion)
        {
            // Argument checks removed for brevity

#if NETSTANDARD2_1
            Span<byte> saltBytes = stackalloc byte[BCryptSaltLen];
            Span<char> result = stackalloc char[29];
#else
            byte[] saltBytes = new byte[BCryptSaltLen];
            char[] result = new char[29];
#endif

            RngCsp.GetBytes(saltBytes);

            result[0] = '$';
            result[1] = '2';
            result[2] = bcryptMinorRevision;
            result[3] = '$';
            result[4] = (char)((workFactor / 10) + '0');
            result[5] = (char)((workFactor % 10) + '0');
            result[6] = '$';

#if NETSTANDARD2_1
            Base64Encoder.EncodeBase64(saltBytes, result.Slice(7));
#else
            Base64Encoder.EncodeBase64(saltBytes, saltBytes.Length, result, 7);
#endif

            return new string(result);
        }

With the Base64 encode and decode methods pulled into their own class. Which looks like this:

        public static char[] EncodeBase64(byte[] byteArray, int length)
        {
            if (length <= 0 || length > byteArray.Length)
            {
                throw new ArgumentException("Invalid length", nameof(length));
            }

            int encodedSize = GetEncodedLength(length);
            char[] encoded = new char[encodedSize];

#if NETSTANDARD2_1
            EncodeBase64(byteArray.AsSpan().Slice(0, length), encoded.AsSpan());
#else
            EncodeBase64(byteArray, length, encoded, 0);
#endif

            return encoded;
        }

#if NETSTANDARD2_1
        public static int EncodeBase64(ReadOnlySpan<byte> source, Span<char> destination)
#else
        public static int EncodeBase64(byte[] source, int sourceLength, char[] destination, int destinationOffset)
#endif
        {

#if NETSTANDARD2_1
            int sourceLength = source.Length;
            const int destinationOffset = 0;
#endif

            int encodedSize = GetEncodedLength(sourceLength);
            int requiredCapacity = encodedSize + destinationOffset;

            if (destination.Length < requiredCapacity)
            {
                throw new ArgumentException("Destination too small.");
            }

            int pos = destinationOffset;
            int off = 0;
            while (off < sourceLength)
            {
            // Removed for brevity
        }

edit: Add benchmarks

Before (continuing from PR):

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
GenerateSalt 356.0 ns 2.87 ns 2.69 ns 0.1144 - - 360 B

After:

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
GenerateSalt 235.8 ns 3.58 ns 2.80 ns 0.0253 - - 80 B
ChrisMcKee commented 4 years ago

The string builder has some quirks; image

Unless you change the data-types going in then there's a limit to what you can achieve anyway (code golf is a tad addictive though). Million ways to skin a cat in .net; but it appears as if the larger the string passed into AppendFormat (and that is quite a big method in .net when you click into it) the slower it gets, regardless of their only being a single parameter. If you switch it to append(string.fmt) you add allocations and time. If you reduce the append fmt to just iformatter:number its faster 😆

image

            var result = new StringBuilder(60);
            result.Append("$2")
                .Append(bcryptMinorRevision)
                .AppendFormat("${0:00}$", workFactor)
                .Append(salt)
                .Append(hash);

image

            result.Append("$2")
                .Append(bcryptMinorRevision)
                .Append("$")
                .AppendFormat("{0:00}", workFactor)
                .Append("$")
                .Append(salt)
                .Append(hash);

The span types were an awesome move for .net though; we do a lot of flat file processing and the perf difference in slowly bringing these in is great... Go can wipe the floor with it but its definitely better than netcore1 or framework 😉

jvandertil commented 4 years ago

Yeah, about Span. I've worked it into the main encipher routine (mostly ifdef'ing the signature to have Span instead of byte[]).

And added stackalloc for the small lr array using in Key and EKSKey. Might make more sense to pull that out into a normal array and not stackallocing, but results are promising.

Before:

Method text hash value Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
VerifyPassword **** $2a$1(...)QPlxO [60] ? 437,683,033.3 ns 5,562,464.16 ns 5,203,132.43 ns - - - 267400 B
HashPassword ? ? **** 6,946,859.0 ns 63,305.62 ns 59,216.12 ns - - - 9698 B
VerifyPassword abcde(...)vwxyz [26] $2a$1(...)Qk7dq [60] ? 110,177,241.4 ns 1,045,349.51 ns 926,675.11 ns - - - 71166 B
HashPassword ? ? abcde(...)vwxyz [26] 6,997,920.2 ns 39,497.80 ns 36,946.26 ns - - - 9802 B

After:

Method text hash value Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
VerifyPassword **** $2a$1(...)QPlxO [60] ? 446,178,420.0 ns 4,295,215.41 ns 4,017,747.16 ns - - - 5176 B
HashPassword ? ? **** 7,362,852.0 ns 72,231.02 ns 56,393.23 ns - - - 5522 B
VerifyPassword abcde(...)vwxyz [26] $2a$1(...)Qk7dq [60] ? 110,922,204.0 ns 1,186,127.30 ns 1,109,504.21 ns - - - 5554 B
HashPassword ? ? abcde(...)vwxyz [26] 7,048,583.9 ns 66,038.58 ns 61,772.53 ns - - - 5627 B
jvandertil commented 4 years ago

That does require introducing a dependency on System.Memory for most platforms. I'm not sure if the constants HAS_SPAN is the best name, but this shows how I've done it.

  <ItemGroup Condition="'$(TargetFramework)' == 'netstandard2.0'
                         or '$(TargetFramework)' == 'net452'
                         or '$(TargetFramework)' == 'net462'
                         or '$(TargetFramework)' == 'net472'">
    <PackageReference Include="System.Memory" Version="4.5.3" />
  </ItemGroup>

  <PropertyGroup Condition="'$(TargetFramework)' == 'netstandard2.1'
                         or '$(TargetFramework)' == 'netstandard2.0'
                         or '$(TargetFramework)' == 'net452'
                         or '$(TargetFramework)' == 'net462'
                         or '$(TargetFramework)' == 'net472'">
    <DefineConstants>$(DefineConstants);HAS_SPAN</DefineConstants>
  </PropertyGroup>
ChrisMcKee commented 4 years ago

The reduction in allocations during verification is interesting if only because on the first pass the allocation is huge 267400 compared to the second test string 71166.

The EKS ~ areas probably less likely to be merged in, in a hurry, as I'm not sure what the implications are from a security point (I'd have to dig / undoubtedly mither a few people an re-crack open the DPA sln to see if msfts daring to use it for this stuff). Definitely interested in seeing it though.

jvandertil commented 4 years ago

Opened a PR so you can see the changes. The difference in the allocations is because both hashes have a different workfactor, so you can't really compare those directly.

        [Benchmark]
        [Arguments("", "$2a$12$k42ZFHFWqBp3vWli.nIn8uYyIkbvYRvodzbfbK18SSsY.CsIQPlxO")]
        [Arguments("abcdefghijklmnopqrstuvwxyz", "$2a$10$fVH8e28OQRj9tqiDXs1e1uxpsjN0c7II7YPKXua2NAKYvM6iQk7dq")]
        public bool VerifyPassword(string text, string hash)
            => BCrypt.Verify(text, hash);
ChrisMcKee commented 4 years ago

Fair enough; I hadn't looked at the actual hashes, the fact its a 12 vs 10 explains a lot.

stackalloc should be fine; doing the same old style had issues the new style doesnt have. The use of stackalloc automatically enables buffer overrun detection features in the common language runtime (CLR). If a buffer overrun is detected, the process is terminated as quickly as possible to minimize the chance that malicious code is executed. which is good.

Span work should in theory be fine.

ChrisMcKee commented 4 years ago

BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18363
Intel Core i7-6800K CPU 3.40GHz (Skylake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=3.1.101
  [Host]     : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
  DefaultJob : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
Method key salt hash Mean Error StdDev Median Ratio RatioSD Rank Gen 0 Gen 1 Gen 2 Allocated
TestHashValidateEnhanced **** $2a$0(...)4cll. [29] $2a$0(...)eX1s. [60] 10.38 ms 0.164 ms 0.154 ms 10.42 ms 1.00 0.00 1 - - - 20.64 KB
TestHashValidateEnhancedPerf1 $2a$0(...)4cll. [29] $2a$0(...)eX1s. [60] 10.40 ms 0.046 ms 0.043 ms 10.40 ms 1.00 0.02 1 - - - 19.26 KB
TestHashValidateEnhanced **** $2a$0(...)Lrgb. [29] $2a$0(...)uUtye [60] 40.81 ms 0.887 ms 0.786 ms 40.44 ms 1.00 0.00 1 - - - 44.63 KB
TestHashValidateEnhancedPerf1 $2a$0(...)Lrgb. [29] $2a$0(...)uUtye [60] 41.55 ms 0.760 ms 0.674 ms 41.59 ms 1.02 0.03 2 - - - 43.24 KB
TestHashValidateEnhanced **** $2a$1(...)Va/ze [29] $2a$1(...)k4TCW [60] 165.78 ms 3.175 ms 3.260 ms 165.77 ms 1.00 0.00 1 - - - 140.78 KB
TestHashValidateEnhancedPerf1 $2a$1(...)Va/ze [29] $2a$1(...)k4TCW [60] 168.20 ms 3.313 ms 4.423 ms 168.89 ms 1.01 0.03 1 - - - 139.24 KB
TestHashValidateEnhanced **** $2a$1(...)nIn8u [29] $2a$1(...)QPlxO [60] 676.66 ms 13.031 ms 18.268 ms 676.95 ms 1.00 0.00 1 - - - 524.63 KB
TestHashValidateEnhancedPerf1 $2a$1(...)nIn8u [29] $2a$1(...)QPlxO [60] 664.51 ms 13.285 ms 15.815 ms 667.74 ms 0.98 0.03 1 - - - 525.33 KB
TestHashValidateEnhanced a $2a$0(...)5zDGO [29] $2a$0(...)YVfxe [60] 10.58 ms 0.207 ms 0.261 ms 10.41 ms 1.00 0.00 2 - - - 20.7 KB
TestHashValidateEnhancedPerf1 a $2a$0(...)5zDGO [29] $2a$0(...)YVfxe [60] 10.22 ms 0.054 ms 0.045 ms 10.22 ms 0.97 0.03 1 - - - 19.33 KB
TestHashValidateEnhanced a $2a$0(...)2EBfe [29] $2a$0(...)lC/V. [60] 41.30 ms 0.852 ms 1.249 ms 40.70 ms 1.00 0.00 1 - - - 44.7 KB
TestHashValidateEnhancedPerf1 a $2a$0(...)2EBfe [29] $2a$0(...)lC/V. [60] 42.45 ms 0.829 ms 1.557 ms 42.36 ms 1.03 0.05 2 - - - 43.3 KB
TestHashValidateEnhanced a $2a$1(...)/cPi. [29] $2a$1(...)SQu4u [60] 165.36 ms 3.284 ms 5.395 ms 163.12 ms 1.00 0.00 1 - - - 140.7 KB
TestHashValidateEnhancedPerf1 a $2a$1(...)/cPi. [29] $2a$1(...)SQu4u [60] 166.88 ms 3.296 ms 4.933 ms 167.31 ms 1.01 0.05 1 - - - 139.32 KB
TestHashValidateEnhanced a $2a$1(...)BakCe [29] $2a$1(...)HZpeS [60] 673.68 ms 13.283 ms 18.181 ms 671.11 ms 1.00 0.00 1 - - - 524.7 KB
TestHashValidateEnhancedPerf1 a $2a$1(...)BakCe [29] $2a$1(...)HZpeS [60] 682.11 ms 13.565 ms 18.569 ms 683.43 ms 1.01 0.05 1 - - - 523.3 KB
TestHashValidateEnhanced abc $2a$0(...)uDeDu [29] $2a$0(...)f7h0i [60] 10.59 ms 0.207 ms 0.247 ms 10.46 ms 1.00 0.00 1 - - - 20.7 KB
TestHashValidateEnhancedPerf1 abc $2a$0(...)uDeDu [29] $2a$0(...)f7h0i [60] 10.63 ms 0.207 ms 0.276 ms 10.52 ms 1.01 0.04 1 - - - 19.3 KB
TestHashValidateEnhanced abc $2a$0(...)yaM7O [29] $2a$0(...)LxKcm [60] 41.95 ms 0.981 ms 1.130 ms 41.63 ms 1.00 0.00 1 - - - 44.71 KB
TestHashValidateEnhancedPerf1 abc $2a$0(...)yaM7O [29] $2a$0(...)LxKcm [60] 42.04 ms 0.835 ms 1.440 ms 41.77 ms 1.01 0.04 1 - - - 43.3 KB
TestHashValidateEnhanced abc $2a$1(...)7EMR. [29] $2a$1(...)aSIUi [60] 169.22 ms 3.368 ms 5.041 ms 169.28 ms 1.00 0.00 1 - - - 140.71 KB
TestHashValidateEnhancedPerf1 abc $2a$1(...)7EMR. [29] $2a$1(...)aSIUi [60] 168.72 ms 3.346 ms 5.403 ms 167.89 ms 1.00 0.04 1 - - - 139.3 KB
TestHashValidateEnhanced abc $2a$1(...)Situ. [29] $2a$1(...)Hg.9q [60] 675.28 ms 13.076 ms 20.357 ms 674.19 ms 1.00 0.00 1 - - - 524.7 KB
TestHashValidateEnhancedPerf1 abc $2a$1(...)Situ. [29] $2a$1(...)Hg.9q [60] 667.30 ms 13.338 ms 17.344 ms 658.82 ms 0.99 0.04 1 - - - 523.3 KB
TestHashValidateEnhanced abcde(...)vwxyz [26] $2a$0(...)OxvGu [29] $2a$0(...)QhstC [60] 10.62 ms 0.241 ms 0.322 ms 10.54 ms 1.00 0.00 1 - - - 20.84 KB
TestHashValidateEnhancedPerf1 abcde(...)vwxyz [26] $2a$0(...)OxvGu [29] $2a$0(...)QhstC [60] 10.41 ms 0.072 ms 0.060 ms 10.38 ms 0.98 0.04 1 - - - 19.47 KB
TestHashValidateEnhanced abcde(...)vwxyz [26] $2a$0(...)flhge [29] $2a$0(...)Tvlz. [60] 42.64 ms 0.843 ms 1.454 ms 42.77 ms 1.00 0.00 2 - - - 44.84 KB
TestHashValidateEnhancedPerf1 abcde(...)vwxyz [26] $2a$0(...)flhge [29] $2a$0(...)Tvlz. [60] 41.50 ms 0.819 ms 1.390 ms 41.02 ms 0.97 0.05 1 - - - 43.45 KB
TestHashValidateEnhanced abcde(...)vwxyz [26] $2a$1(...)s1e1u [29] $2a$1(...)Qk7dq [60] 169.47 ms 3.378 ms 4.952 ms 169.73 ms 1.00 0.00 1 - - - 140.84 KB
TestHashValidateEnhancedPerf1 abcde(...)vwxyz [26] $2a$1(...)s1e1u [29] $2a$1(...)Qk7dq [60] 168.60 ms 3.358 ms 5.882 ms 166.15 ms 1.00 0.05 1 - - - 139.45 KB
TestHashValidateEnhanced abcde(...)vwxyz [26] $2a$1(...)L7Gpu [29] $2a$1(...)wJ/pG [60] 664.16 ms 13.250 ms 18.137 ms 664.71 ms 1.00 0.00 1 - - - 524.84 KB
TestHashValidateEnhancedPerf1 abcde(...)vwxyz [26] $2a$1(...)L7Gpu [29] $2a$1(...)wJ/pG [60] 675.49 ms 13.499 ms 21.411 ms 673.42 ms 1.02 0.04 1 - - - 523.45 KB
TestHashValidateEnhanced ~!@#$(...)NBFRD [34] $2a$0(...)faOI. [29] $2a$0(...)P6FfO [60] 10.47 ms 0.209 ms 0.319 ms 10.27 ms 1.00 0.00 1 - - - 20.88 KB
TestHashValidateEnhancedPerf1 ~!@#$(...)NBFRD [34] $2a$0(...)faOI. [29] $2a$0(...)P6FfO [60] 10.41 ms 0.270 ms 0.253 ms 10.37 ms 0.99 0.05 1 - - - 19.5 KB
TestHashValidateEnhanced ~!@#$(...)NBFRD [34] $2a$0(...)262hu [29] $2a$0(...)9UxTW [60] 41.29 ms 1.199 ms 1.333 ms 40.45 ms 1.00 0.00 1 - - - 44.88 KB
TestHashValidateEnhancedPerf1 ~!@#$(...)NBFRD [34] $2a$0(...)262hu [29] $2a$0(...)9UxTW [60] 42.68 ms 0.852 ms 1.579 ms 42.33 ms 1.03 0.04 2 - - - 43.49 KB
TestHashValidateEnhanced ~!@#$(...)NBFRD [34] $2a$1(...)rOvHe [29] $2a$1(...)JYlfS [60] 167.60 ms 3.328 ms 4.208 ms 166.77 ms 1.00 0.00 1 - - - 140.88 KB
TestHashValidateEnhancedPerf1 ~!@#$(...)NBFRD [34] $2a$1(...)rOvHe [29] $2a$1(...)JYlfS [60] 165.25 ms 3.236 ms 3.726 ms 164.37 ms 0.98 0.04 1 - - - 139.49 KB
TestHashValidateEnhanced ~!@#$(...)NBFRD [34] $2a$1(...)nkrPO [29] $2a$1(...)eyhgC [60] 671.31 ms 13.073 ms 14.530 ms 666.76 ms 1.00 0.00 1 - - - 525.52 KB
TestHashValidateEnhancedPerf1 ~!@#$(...)NBFRD [34] $2a$1(...)nkrPO [29] $2a$1(...)eyhgC [60] 665.57 ms 13.108 ms 20.017 ms 666.63 ms 0.98 0.03 1 - - - 523.68 KB
ChrisMcKee commented 4 years ago

BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18363
Intel Core i7-6800K CPU 3.40GHz (Skylake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=3.1.101
  [Host]     : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
  DefaultJob : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
Method key salt hash Mean Error StdDev Ratio RatioSD Rank Gen 0 Gen 1 Gen 2 Allocated
TestHashValidateEnhanced **** $2a$0(...)4cll. [29] $2a$0(...)eX1s. [60] 10.76 ms 0.212 ms 0.460 ms 1.00 0.00 1 - - - 19.63 KB
TestHashValidateEnhancedPerf1 $2a$0(...)4cll. [29] $2a$0(...)eX1s. [60] 10.72 ms 0.214 ms 0.452 ms 1.00 0.06 1 - - - 18.24 KB
TestHashValidateEnhanced **** $2a$0(...)Lrgb. [29] $2a$0(...)uUtye [60] 42.63 ms 0.850 ms 1.919 ms 1.00 0.00 1 - - - 43.63 KB
TestHashValidateEnhancedPerf1 $2a$0(...)Lrgb. [29] $2a$0(...)uUtye [60] 43.24 ms 0.859 ms 1.885 ms 1.02 0.06 1 - - - 42.24 KB
TestHashValidateEnhanced **** $2a$1(...)Va/ze [29] $2a$1(...)k4TCW [60] 169.35 ms 3.379 ms 6.979 ms 1.00 0.00 1 - - - 139.63 KB
TestHashValidateEnhancedPerf1 $2a$1(...)Va/ze [29] $2a$1(...)k4TCW [60] 169.56 ms 3.338 ms 5.934 ms 1.01 0.06 1 - - - 138.24 KB
TestHashValidateEnhanced **** $2a$1(...)nIn8u [29] $2a$1(...)QPlxO [60] 682.67 ms 13.592 ms 25.193 ms 1.00 0.00 1 - - - 523.63 KB
TestHashValidateEnhancedPerf1 $2a$1(...)nIn8u [29] $2a$1(...)QPlxO [60] 689.21 ms 13.728 ms 26.120 ms 1.01 0.05 1 - - - 522.24 KB
TestHashValidateEnhanced a $2a$0(...)5zDGO [29] $2a$0(...)YVfxe [60] 10.91 ms 0.218 ms 0.404 ms 1.00 0.00 1 - - - 19.7 KB
TestHashValidateEnhancedPerf1 a $2a$0(...)5zDGO [29] $2a$0(...)YVfxe [60] 10.85 ms 0.214 ms 0.407 ms 0.99 0.05 1 - - - 18.3 KB
TestHashValidateEnhanced a $2a$0(...)2EBfe [29] $2a$0(...)lC/V. [60] 42.67 ms 0.842 ms 1.952 ms 1.00 0.00 1 - - - 43.7 KB
TestHashValidateEnhancedPerf1 a $2a$0(...)2EBfe [29] $2a$0(...)lC/V. [60] 41.91 ms 0.644 ms 0.538 ms 1.01 0.04 1 - - - 42.3 KB
TestHashValidateEnhanced a $2a$1(...)/cPi. [29] $2a$1(...)SQu4u [60] 165.28 ms 3.183 ms 4.357 ms 1.00 0.00 1 - - - 139.7 KB
TestHashValidateEnhancedPerf1 a $2a$1(...)/cPi. [29] $2a$1(...)SQu4u [60] 162.70 ms 2.860 ms 2.676 ms 0.98 0.03 1 - - - 138.66 KB
TestHashValidateEnhanced a $2a$1(...)BakCe [29] $2a$1(...)HZpeS [60] 655.93 ms 6.004 ms 4.688 ms 1.00 0.00 1 - - - 523.7 KB
TestHashValidateEnhancedPerf1 a $2a$1(...)BakCe [29] $2a$1(...)HZpeS [60] 649.06 ms 11.633 ms 10.313 ms 0.99 0.02 1 - - - 522.3 KB
TestHashValidateEnhanced abc $2a$0(...)uDeDu [29] $2a$0(...)f7h0i [60] 10.41 ms 0.110 ms 0.103 ms 1.00 0.00 1 - - - 19.71 KB
TestHashValidateEnhancedPerf1 abc $2a$0(...)uDeDu [29] $2a$0(...)f7h0i [60] 10.51 ms 0.197 ms 0.211 ms 1.01 0.03 1 - - - 18.32 KB
TestHashValidateEnhanced abc $2a$0(...)yaM7O [29] $2a$0(...)LxKcm [60] 41.01 ms 0.818 ms 0.725 ms 1.00 0.00 1 - - - 43.7 KB
TestHashValidateEnhancedPerf1 abc $2a$0(...)yaM7O [29] $2a$0(...)LxKcm [60] 41.49 ms 0.789 ms 0.939 ms 1.01 0.02 1 - - - 42.3 KB
TestHashValidateEnhanced abc $2a$1(...)7EMR. [29] $2a$1(...)aSIUi [60] 167.46 ms 1.229 ms 1.150 ms 1.00 0.00 1 - - - 139.7 KB
TestHashValidateEnhancedPerf1 abc $2a$1(...)7EMR. [29] $2a$1(...)aSIUi [60] 165.89 ms 3.187 ms 3.272 ms 0.99 0.02 1 - - - 140.31 KB
TestHashValidateEnhanced abc $2a$1(...)Situ. [29] $2a$1(...)Hg.9q [60] 664.15 ms 8.225 ms 7.291 ms 1.00 0.00 2 - - - 523.7 KB
TestHashValidateEnhancedPerf1 abc $2a$1(...)Situ. [29] $2a$1(...)Hg.9q [60] 653.74 ms 4.374 ms 4.091 ms 0.98 0.01 1 - - - 522.3 KB
TestHashValidateEnhanced abcde(...)vwxyz [26] $2a$0(...)OxvGu [29] $2a$0(...)QhstC [60] 10.40 ms 0.145 ms 0.136 ms 1.00 0.00 1 - - - 19.86 KB
TestHashValidateEnhancedPerf1 abcde(...)vwxyz [26] $2a$0(...)OxvGu [29] $2a$0(...)QhstC [60] 10.41 ms 0.090 ms 0.084 ms 1.00 0.02 1 - - - 18.47 KB
TestHashValidateEnhanced abcde(...)vwxyz [26] $2a$0(...)flhge [29] $2a$0(...)Tvlz. [60] 40.45 ms 0.567 ms 0.503 ms 1.00 0.00 1 - - - 43.84 KB
TestHashValidateEnhancedPerf1 abcde(...)vwxyz [26] $2a$0(...)flhge [29] $2a$0(...)Tvlz. [60] 40.53 ms 0.220 ms 0.195 ms 1.00 0.01 1 - - - 42.45 KB
TestHashValidateEnhanced abcde(...)vwxyz [26] $2a$1(...)s1e1u [29] $2a$1(...)Qk7dq [60] 160.99 ms 1.891 ms 1.676 ms 1.00 0.00 1 - - - 140.04 KB
TestHashValidateEnhancedPerf1 abcde(...)vwxyz [26] $2a$1(...)s1e1u [29] $2a$1(...)Qk7dq [60] 162.40 ms 3.208 ms 4.057 ms 1.01 0.03 1 - - - 138.45 KB
TestHashValidateEnhanced abcde(...)vwxyz [26] $2a$1(...)L7Gpu [29] $2a$1(...)wJ/pG [60] 647.53 ms 3.741 ms 3.500 ms 1.00 0.00 1 - - - 525.17 KB
TestHashValidateEnhancedPerf1 abcde(...)vwxyz [26] $2a$1(...)L7Gpu [29] $2a$1(...)wJ/pG [60] 658.49 ms 12.966 ms 13.315 ms 1.02 0.02 2 - - - 524.45 KB
TestHashValidateEnhanced ~!@#$(...)NBFRD [34] $2a$0(...)faOI. [29] $2a$0(...)P6FfO [60] 10.27 ms 0.047 ms 0.044 ms 1.00 0.00 1 - - - 19.9 KB
TestHashValidateEnhancedPerf1 ~!@#$(...)NBFRD [34] $2a$0(...)faOI. [29] $2a$0(...)P6FfO [60] 10.42 ms 0.047 ms 0.041 ms 1.01 0.01 2 - - - 18.51 KB
TestHashValidateEnhanced ~!@#$(...)NBFRD [34] $2a$0(...)262hu [29] $2a$0(...)9UxTW [60] 40.57 ms 0.268 ms 0.251 ms 1.00 0.00 1 - - - 43.88 KB
TestHashValidateEnhancedPerf1 ~!@#$(...)NBFRD [34] $2a$0(...)262hu [29] $2a$0(...)9UxTW [60] 40.43 ms 0.349 ms 0.292 ms 1.00 0.01 1 - - - 42.49 KB
TestHashValidateEnhanced ~!@#$(...)NBFRD [34] $2a$1(...)rOvHe [29] $2a$1(...)JYlfS [60] 161.34 ms 1.174 ms 1.098 ms 1.00 0.00 1 - - - 140.22 KB
TestHashValidateEnhancedPerf1 ~!@#$(...)NBFRD [34] $2a$1(...)rOvHe [29] $2a$1(...)JYlfS [60] 164.05 ms 0.746 ms 0.697 ms 1.02 0.01 2 - - - 138.82 KB
TestHashValidateEnhanced ~!@#$(...)NBFRD [34] $2a$1(...)nkrPO [29] $2a$1(...)eyhgC [60] 646.64 ms 5.192 ms 4.857 ms 1.00 0.00 1 - - - 525.22 KB
TestHashValidateEnhancedPerf1 ~!@#$(...)NBFRD [34] $2a$1(...)nkrPO [29] $2a$1(...)eyhgC [60] 654.55 ms 2.720 ms 2.271 ms 1.01 0.01 1 - - - 523.81 KB
ChrisMcKee commented 4 years ago

Had to resort to excel; god I hate R

Mean Time (ns) Mean Time (ns)

Mean Allocation (kb) Mean Allocation

results.zip

ChrisMcKee commented 4 years ago

image image

ChrisMcKee commented 4 years ago

all the non span bits merged into master. I'll hopefully dig around the span bit a bit more.

Thanks for all the back and forth and the PRs; greatly appreciated 😁

jvandertil commented 4 years ago

Awesome, glad to be able to help. The span PR could be done without Span by moving the ‘_lr’ array into a private field and initializing it instead of allocating a new array each iteration. Should give roughly the same order of savings. Not sure if there are any security implications when doing that tho.

Shouldn’t really matter as you can then control when the array is cleared instead of leaving it up to the GC.

ChrisMcKee commented 4 years ago

Definitely going to have a poke around / add docker to the benchmarking to see how much it varies between OS; Between 472/48 and core 2.1/3.1 there's nothing really noticeable. WHich is nice from a predictability standpoint. Alpine + Ubuntu will be the obvious choice in this container crazy world.

Thanks again!

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

ChrisMcKee commented 4 years ago

Closing as its in master; this will go out with the next release

penguinawesome commented 3 years ago

@ChrisMcKee is this included in the 4.0.2 release?

ChrisMcKee commented 3 years ago

Yup