Reduce memory usage - Githubissues

jvandertil commented 4 years ago

I was looking through the source code of the library, and noticed that there is quite some room to reduce the amount of memory allocated for certain operations.

I ran benchmarks (source code attached) for the (I think) most commonly used public methods, and came up with these results.

Obviously a large part of the runtime and memory usage is inherent to the algorithm itself, but I do believe that some optimizations might be useful to reduce unnecessary allocations.

Things I noticed at first glance:

StringBuilders are used as scratch buffers, preallocating might be possible.
The hash interrogation regexes generate a lot of garbage, a fast path for PasswordNeedsRehash could be added to improve this scenario. A simple custom parser could improve this significantly.
.NET Standard 2.1 has a lot of Span based API's that could be used to implement optimized paths if the consumer is compatible.

Are you interested in taking PR's for this?


BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18363
Intel Core i7-3632QM CPU 2.20GHz (Ivy Bridge), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.1.101
  [Host]     : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
  DefaultJob : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT

Method	text	hash	value	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
GenerateSalt	?	?	?	748.4 ns	5.92 ns	4.95 ns	0.2155	-	-	680 B
PasswordNeedsRehash	?	?	?	3,395.1 ns	37.28 ns	31.13 ns	0.4196	-	-	1328 B
VerifyPassword	****	$2a$0(...)eX1s. [60]	?	7,105,149.9 ns	43,366.64 ns	36,213.11 ns	-	-	-	10064 B
HashPassword	?	?	****	7,015,698.7 ns	48,091.09 ns	44,984.44 ns	-	-	-	10405 B
VerifyPassword	abcde(...)vwxyz [26]	$2a$0(...)QhstC [60]	?	7,090,475.1 ns	18,558.79 ns	15,497.43 ns	-	-	-	10174 B
HashPassword	?	?	abcde(...)vwxyz [26]	7,154,595.2 ns	85,278.99 ns	79,770.02 ns	-	-	-	10515 B

    [MemoryDiagnoser]
    public class Benchmarks
    {
        [Benchmark]
        public string GenerateSalt()
            => BCrypt.GenerateSalt(10);

        [Benchmark]
        public bool PasswordNeedsRehash()
            => BCrypt.PasswordNeedsRehash("$2a$06$DCq7YPn5Rq63x1Lad4cll.TV4S6ytwfsfvkgY8jIucDrjc8deX1s.", 10);

        [Benchmark]
        [Arguments("")]
        [Arguments("abcdefghijklmnopqrstuvwxyz")]
        public string HashPassword(string value)
            => BCrypt.HashPassword(value, "$2a$06$DCq7YPn5Rq63x1Lad4cll.", true, HashType.SHA384);

        [Benchmark]
        [Arguments("", "$2a$06$DCq7YPn5Rq63x1Lad4cll.TV4S6ytwfsfvkgY8jIucDrjc8deX1s.")]
        [Arguments("abcdefghijklmnopqrstuvwxyz", "$2a$06$.rCVZVOThsIa97pEDOxvGuRRgzG64bvtJ0938xuqzv18d3ZpQhstC")]
        public bool VerifyPassword(string text, string hash)
            => BCrypt.Verify(text, hash);
    }

Update: Aligned benchmarks to all use workfactor 6 hashes

ChrisMcKee commented 4 years ago

You peaked my interest as I'd totally blocked out the stringbuilder usage. Of course some of the codes simply a remanent of it being written for .net 2, and what was 'best' then isn't really best now.

As for spans I'd purposely avoided them due to the support for this going way back and the expectation that if would be actively replaced within the framework (free-wins) without having to alter the code / add ifdefs. The diff in perf for adding the extra complexity would have to be meaningful to say the least.

I'd happily take anything around Interrogate hash and PasswordNeedsRehash (see other ticket), especially if they make it more useful / safer / better perf as they're mostly utility methods.

Base

Sized StringBuilder

String interpolation (string format in sheeps clothing)

Using length in EncodeB64 string builder to preallocate (wonky unix version)

Setting DecodeBase64 to use the max bytes as the stringbuilder length. has no noticeable difference

I thought strings create would have more of an impact but tbh not a lot to see here; Or I'm using it wrong; which is entirely likely considering my exposure to Spans been relatively limited.

jvandertil commented 4 years ago

I have a couple things laying around, should I send the PR's to the perf branch?

I'll definitely take a look at the InterrogateHash method, shouldn't be too hard to improve that. It's not clear to me that the API is settled in the other ticket, is that correct?

ChrisMcKee commented 4 years ago

@jvandertil sure fire away. I wasn't 100% sure the api in the other ticket was necessarily something that belonged in the library. But making it easier to implement / a better method probably wouldn't go amiss either. The ultimate conflict between trying to keep the api simple and keeping people happy 😆

ChrisMcKee commented 4 years ago

HashParser Regex to non-regex


BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18363
Intel Core i7-6800K CPU 3.40GHz (Skylake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=3.1.101
  [Host]     : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
  DefaultJob : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT

Method	hash	Mean	Error	StdDev	Ratio	Rank	Gen 0	Gen 1	Gen 2	Allocated
InterrogateHashUsingRegex	$2a$1(...)Qk7dq [60]	2,419.3 ns	48.15 ns	88.05 ns	1.00	2	0.1678	-	-	1328 B
InterrogateHashUsingParser	$2a$1(...)Qk7dq [60]	296.0 ns	6.53 ns	11.09 ns	0.12	1	0.0353	-	-	280 B

InterrogateHashUsingRegex	$2a$1(...)QPlxO [60]	2,337.5 ns	46.22 ns	73.31 ns	1.00	2	0.1678	-	-	1328 B
InterrogateHashUsingParser	$2a$1(...)QPlxO [60]	292.9 ns	5.74 ns	8.94 ns	0.13	1	0.0353	-	-	280 B

🎈🎆🎇🎈🎉

ChrisMcKee commented 4 years ago

Base64 Decoding; unsurprisingly not allocating it to string is a winner here.


BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18363
Intel Core i7-6800K CPU 3.40GHz (Skylake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=3.1.101
  [Host]     : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
  DefaultJob : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT

Method	salt	Mean	Error	StdDev	Ratio	RatioSD	Rank	Gen 0	Gen 1	Gen 2	Allocated
DecodeBase64StandardUnSized	DCq7Y(...)4cll. [22]	160.47 ns	3.223 ns	6.287 ns	1.00	0.00	3	0.0184	-	-	144 B
DecodeBase64StandardSized	DCq7Y(...)4cll. [22]	165.65 ns	3.333 ns	5.476 ns	1.03	0.05	4	0.0184	-	-	144 B
DecodeBase64StringCreateSpan	DCq7Y(...)4cll. [22]	115.95 ns	2.343 ns	3.284 ns	0.73	0.03	2	0.0296	-	-	232 B
DecodeBase64ToBytes	DCq7Y(...)4cll. [22]	73.71 ns	1.502 ns	2.966 ns	0.46	0.02	1	0.0050	-	-	40 B

DecodeBase64StandardUnSized	HqWuK(...)Lrgb. [22]	164.26 ns	3.282 ns	5.012 ns	1.00	0.00	3	0.0184	-	-	144 B
DecodeBase64StandardSized	HqWuK(...)Lrgb. [22]	163.55 ns	3.252 ns	6.342 ns	0.99	0.05	3	0.0184	-	-	144 B
DecodeBase64StringCreateSpan	HqWuK(...)Lrgb. [22]	114.45 ns	2.301 ns	5.051 ns	0.70	0.04	2	0.0296	-	-	232 B
DecodeBase64ToBytes	HqWuK(...)Lrgb. [22]	74.16 ns	1.503 ns	2.748 ns	0.45	0.02	1	0.0050	-	-	40 B

ChrisMcKee commented 4 years ago

Base64Encoding as bytes


BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18363
Intel Core i7-6800K CPU 3.40GHz (Skylake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=3.1.101
  [Host]     : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
  DefaultJob : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT

Method	Mean	Error	StdDev	Ratio	RatioSD	Rank	Gen 0	Gen 1	Gen 2	Allocated
EncodeBase64Unsized	137.70 ns	2.704 ns	4.444 ns	1.00	0.00	2	0.0355	-	-	280 B
EncodeBase64Sized	149.75 ns	3.023 ns	5.967 ns	1.09	0.06	3	0.0355	-	-	280 B
EncodeBase64AsBytes	51.95 ns	1.050 ns	1.635 ns	0.38	0.01	1	0.0092	-	-	72 B

Surprisingly pre-sizing had a negative effect; cutting out the extra allocations is a win perf wise.

ChrisMcKee commented 4 years ago

String Allocation time


BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18363
Intel Core i7-6800K CPU 3.40GHz (Skylake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=3.1.101
  [Host]     : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
  DefaultJob : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT

Method	Categories	Mean	Error	StdDev	Ratio	RatioSD	Rank	Gen 0	Gen 1	Gen 2	Allocated
Original_StrBuilder_SinEncoding	StringAppend,AppendString	348.9 ns	6.91 ns	10.95 ns	1.00	0.00	3	0.0648	-	-	512 B
Original_StrBuilder_SinEncoding_AppendChar	StringAppend,AppendChar	347.9 ns	6.96 ns	11.63 ns	1.00	0.04	3	0.0648	-	-	512 B
Original_StrBuilder_SinEncoding_AppendChar_Sized	StringAppend,AppendChar	173.3 ns	3.36 ns	4.72 ns	0.50	0.02	2	0.0458	-	-	360 B
Original_StrBuilder_SinEncoding_AppendChar_Sized_PRFmt	StringAppend,AppendChar	115.4 ns	1.97 ns	2.75 ns	0.33	0.01	1	0.0468	-	-	368 B
Original_StrBuilder_SinEncoding_AppendChar_Sized_FROMSTRING_PRFmt	StringAppend,AppendString	117.4 ns	2.37 ns	3.90 ns	0.34	0.02	1	0.0468	-	-	368 B
StringInterpolation_WithChar	StringFmt,AppendChar	384.5 ns	7.46 ns	9.96 ns	1.11	0.04	4	0.0210	-	-	168 B
StringInterpolation_WithString	StringFmt,AppendString	354.8 ns	7.06 ns	7.55 ns	1.02	0.05	3	0.0281	-	-	224 B

Bit of a mixed bag here; speed wise the original (master) code, sized and appending char was slightly faster than the change in the PR. (the change in encoding to char being the collective winner to enhancing this methods allocations)

Allocation wise the string.format using char was the winner.

jvandertil commented 4 years ago

I am not sure if this code does what you expect it to do:

        [Benchmark]
        [BenchmarkCategory("StringFmt", "AppendChar")]

        public void StringInterpolation_WithChar()
        {
            var res = $"$2{bcryptMinorRevision}${workFactor:00}${EncodedSaltAsChars}{EncodedHashAsChars}";
        }

The char[] would be interpolated as "System.Char[]".

jvandertil commented 4 years ago

As an aside, the benchmark functions should return the generated value, so in the example above:

[Benchmark]
[BenchmarkCategory("StringFmt", "AppendChar")]
public string StringInterpolation_WithChar()
{
    return $"$2{bcryptMinorRevision}${workFactor:00}${EncodedSaltAsChars}{EncodedHashAsChars}";
}

It looks like all the benchmarks should return the same value, if so: you could put a [ReturnValueValidator(failOnError: true)] attribute on the class so that the benchmarks fail if they do not.

ChrisMcKee commented 4 years ago

Can't allocate memory if you never use the char[] 😆

New string'd it and renamed something and copy pasted the string over the top 😆

Just RTFM'ing the BM docs, its changed a bit in the last ~2 years; the error things handy.

ChrisMcKee commented 4 years ago


BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18363
Intel Core i7-6800K CPU 3.40GHz (Skylake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=3.1.101
  [Host]     : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
  DefaultJob : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT

Method	Categories	Mean	Error	StdDev	Ratio	RatioSD	Rank	Gen 0	Gen 1	Gen 2	Allocated
Original_StrBuilder_SinEncoding	StringAppend,AppendString	343.8 ns	5.32 ns	4.72 ns	1.00	0.00	3	0.0648	-	-	512 B
Original_StrBuilder_SinEncoding_AppendChar	StringAppend,AppendChar	352.0 ns	7.06 ns	9.42 ns	1.02	0.03	3	0.0648	-	-	512 B
Original_StrBuilder_SinEncoding_AppendChar_Sized	StringAppend,AppendChar	287.7 ns	5.76 ns	10.38 ns	0.83	0.03	2	0.0458	-	-	360 B
Original_StrBuilder_SinEncoding_AppendChar_Sized_PRFmt	StringAppend,AppendChar	120.1 ns	2.45 ns	4.71 ns	0.35	0.02	1	0.0467	-	-	368 B
Original_StrBuilder_SinEncoding_AppendChar_Sized_FROMSTRING_PRFmt	StringAppend,AppendString	117.2 ns	2.38 ns	3.63 ns	0.34	0.01	1	0.0467	-	-	368 B
StringInterpolation_WithChar	StringFmt,AppendChar	384.3 ns	7.50 ns	11.45 ns	1.11	0.03	4	0.0486	-	-	384 B
StringInterpolation_WithString	StringFmt,AppendString	341.9 ns	6.48 ns	6.65 ns	0.99	0.02	3	0.0281	-	-	224 B

Makes more sense; more so when you look at what .net is doing

        public StringBuilder Append(char[]? value)
        {
            if (value?.Length > 0)
            {
                unsafe
                {
                    fixed (char* valueChars = &value[0])
                    {
                        Append(valueChars, value.Length);
                    }
                }
            }
            return this;
        }

        public StringBuilder Append(ReadOnlySpan<char> value)
        {
            if (value.Length > 0)
            {
                unsafe
                {
                    fixed (char* valueChars = &MemoryMarshal.GetReference(value))
                    {
                        Append(valueChars, value.Length);
                    }
                }
            }
            return this;
        }

^ noted they've got span in there as well now; the ticket took that long to make it I wasnt sure it was in 3.1

jvandertil commented 4 years ago

Yeah, I was experimenting a bit with the Base64Encoder, letting it write directly into a Span<char> that was passed in, which was stack allocated and then put in a string with new string(Span<char>) 😎 . The only heap allocation that GenerateSalt had after that was the string it had to return. The lowest it can go is 80 bytes (29 chars * 2 bytes (padded to 64 bytes for memory alignment) + 16 bytes object header).

Looks like this:

        public static string GenerateSalt(int workFactor, char bcryptMinorRevision = DefaultHashVersion)
        {
            // Argument checks removed for brevity

#if NETSTANDARD2_1
            Span<byte> saltBytes = stackalloc byte[BCryptSaltLen];
            Span<char> result = stackalloc char[29];
#else
            byte[] saltBytes = new byte[BCryptSaltLen];
            char[] result = new char[29];
#endif

            RngCsp.GetBytes(saltBytes);

            result[0] = '$';
            result[1] = '2';
            result[2] = bcryptMinorRevision;
            result[3] = '$';
            result[4] = (char)((workFactor / 10) + '0');
            result[5] = (char)((workFactor % 10) + '0');
            result[6] = '$';

#if NETSTANDARD2_1
            Base64Encoder.EncodeBase64(saltBytes, result.Slice(7));
#else
            Base64Encoder.EncodeBase64(saltBytes, saltBytes.Length, result, 7);
#endif

            return new string(result);
        }

With the Base64 encode and decode methods pulled into their own class. Which looks like this:

        public static char[] EncodeBase64(byte[] byteArray, int length)
        {
            if (length <= 0 || length > byteArray.Length)
            {
                throw new ArgumentException("Invalid length", nameof(length));
            }

            int encodedSize = GetEncodedLength(length);
            char[] encoded = new char[encodedSize];

#if NETSTANDARD2_1
            EncodeBase64(byteArray.AsSpan().Slice(0, length), encoded.AsSpan());
#else
            EncodeBase64(byteArray, length, encoded, 0);
#endif

            return encoded;
        }

#if NETSTANDARD2_1
        public static int EncodeBase64(ReadOnlySpan<byte> source, Span<char> destination)
#else
        public static int EncodeBase64(byte[] source, int sourceLength, char[] destination, int destinationOffset)
#endif
        {

#if NETSTANDARD2_1
            int sourceLength = source.Length;
            const int destinationOffset = 0;
#endif

            int encodedSize = GetEncodedLength(sourceLength);
            int requiredCapacity = encodedSize + destinationOffset;

            if (destination.Length < requiredCapacity)
            {
                throw new ArgumentException("Destination too small.");
            }

            int pos = destinationOffset;
            int off = 0;
            while (off < sourceLength)
            {
            // Removed for brevity
        }

edit: Add benchmarks

Before (continuing from PR):

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
GenerateSalt	356.0 ns	2.87 ns	2.69 ns	0.1144	-	-	360 B

After:

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
GenerateSalt	235.8 ns	3.58 ns	2.80 ns	0.0253	-	-	80 B

ChrisMcKee commented 4 years ago

The string builder has some quirks;

Unless you change the data-types going in then there's a limit to what you can achieve anyway (code golf is a tad addictive though). Million ways to skin a cat in .net; but it appears as if the larger the string passed into AppendFormat (and that is quite a big method in .net when you click into it) the slower it gets, regardless of their only being a single parameter. If you switch it to append(string.fmt) you add allocations and time. If you reduce the append fmt to just iformatter:number its faster 😆

            var result = new StringBuilder(60);
            result.Append("$2")
                .Append(bcryptMinorRevision)
                .AppendFormat("${0:00}$", workFactor)
                .Append(salt)
                .Append(hash);

            result.Append("$2")
                .Append(bcryptMinorRevision)
                .Append("$")
                .AppendFormat("{0:00}", workFactor)
                .Append("$")
                .Append(salt)
                .Append(hash);

The span types were an awesome move for .net though; we do a lot of flat file processing and the perf difference in slowly bringing these in is great... Go can wipe the floor with it but its definitely better than netcore1 or framework 😉

jvandertil commented 4 years ago

Yeah, about Span. I've worked it into the main encipher routine (mostly ifdef'ing the signature to have Span instead of byte[]).

And added stackalloc for the small lr array using in Key and EKSKey. Might make more sense to pull that out into a normal array and not stackallocing, but results are promising.

Before:

Method	text	hash	value	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
VerifyPassword	****	$2a$1(...)QPlxO [60]	?	437,683,033.3 ns	5,562,464.16 ns	5,203,132.43 ns	-	-	-	267400 B
HashPassword	?	?	****	6,946,859.0 ns	63,305.62 ns	59,216.12 ns	-	-	-	9698 B
VerifyPassword	abcde(...)vwxyz [26]	$2a$1(...)Qk7dq [60]	?	110,177,241.4 ns	1,045,349.51 ns	926,675.11 ns	-	-	-	71166 B
HashPassword	?	?	abcde(...)vwxyz [26]	6,997,920.2 ns	39,497.80 ns	36,946.26 ns	-	-	-	9802 B

After:

Method	text	hash	value	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
VerifyPassword	****	$2a$1(...)QPlxO [60]	?	446,178,420.0 ns	4,295,215.41 ns	4,017,747.16 ns	-	-	-	5176 B
HashPassword	?	?	****	7,362,852.0 ns	72,231.02 ns	56,393.23 ns	-	-	-	5522 B
VerifyPassword	abcde(...)vwxyz [26]	$2a$1(...)Qk7dq [60]	?	110,922,204.0 ns	1,186,127.30 ns	1,109,504.21 ns	-	-	-	5554 B
HashPassword	?	?	abcde(...)vwxyz [26]	7,048,583.9 ns	66,038.58 ns	61,772.53 ns	-	-	-	5627 B

jvandertil commented 4 years ago

That does require introducing a dependency on System.Memory for most platforms. I'm not sure if the constants HAS_SPAN is the best name, but this shows how I've done it.

  <ItemGroup Condition="'$(TargetFramework)' == 'netstandard2.0'
                         or '$(TargetFramework)' == 'net452'
                         or '$(TargetFramework)' == 'net462'
                         or '$(TargetFramework)' == 'net472'">
    <PackageReference Include="System.Memory" Version="4.5.3" />
  </ItemGroup>

  <PropertyGroup Condition="'$(TargetFramework)' == 'netstandard2.1'
                         or '$(TargetFramework)' == 'netstandard2.0'
                         or '$(TargetFramework)' == 'net452'
                         or '$(TargetFramework)' == 'net462'
                         or '$(TargetFramework)' == 'net472'">
    <DefineConstants>$(DefineConstants);HAS_SPAN</DefineConstants>
  </PropertyGroup>

ChrisMcKee commented 4 years ago

The reduction in allocations during verification is interesting if only because on the first pass the allocation is huge 267400 compared to the second test string 71166.

The EKS ~ areas probably less likely to be merged in, in a hurry, as I'm not sure what the implications are from a security point (I'd have to dig / undoubtedly mither a few people an re-crack open the DPA sln to see if msfts daring to use it for this stuff). Definitely interested in seeing it though.

jvandertil commented 4 years ago

Opened a PR so you can see the changes. The difference in the allocations is because both hashes have a different workfactor, so you can't really compare those directly.

        [Benchmark]
        [Arguments("", "$2a$12$k42ZFHFWqBp3vWli.nIn8uYyIkbvYRvodzbfbK18SSsY.CsIQPlxO")]
        [Arguments("abcdefghijklmnopqrstuvwxyz", "$2a$10$fVH8e28OQRj9tqiDXs1e1uxpsjN0c7II7YPKXua2NAKYvM6iQk7dq")]
        public bool VerifyPassword(string text, string hash)
            => BCrypt.Verify(text, hash);

ChrisMcKee commented 4 years ago

Fair enough; I hadn't looked at the actual hashes, the fact its a 12 vs 10 explains a lot.

stackalloc should be fine; doing the same old style had issues the new style doesnt have. The use of stackalloc automatically enables buffer overrun detection features in the common language runtime (CLR). If a buffer overrun is detected, the process is terminated as quickly as possible to minimize the chance that malicious code is executed. which is good.

Span work should in theory be fine.

ChrisMcKee commented 4 years ago


BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18363
Intel Core i7-6800K CPU 3.40GHz (Skylake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=3.1.101
  [Host]     : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
  DefaultJob : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT

Method	key	salt	hash	Mean	Error	StdDev	Median	Ratio	RatioSD	Rank	Gen 0	Gen 1	Gen 2	Allocated
TestHashValidateEnhanced	****	$2a$0(...)4cll. [29]	$2a$0(...)eX1s. [60]	10.38 ms	0.164 ms	0.154 ms	10.42 ms	1.00	0.00	1	-	-	-	20.64 KB
TestHashValidateEnhancedPerf1		$2a$0(...)4cll. [29]	$2a$0(...)eX1s. [60]	10.40 ms	0.046 ms	0.043 ms	10.40 ms	1.00	0.02	1	-	-	-	19.26 KB

TestHashValidateEnhanced	****	$2a$0(...)Lrgb. [29]	$2a$0(...)uUtye [60]	40.81 ms	0.887 ms	0.786 ms	40.44 ms	1.00	0.00	1	-	-	-	44.63 KB
TestHashValidateEnhancedPerf1		$2a$0(...)Lrgb. [29]	$2a$0(...)uUtye [60]	41.55 ms	0.760 ms	0.674 ms	41.59 ms	1.02	0.03	2	-	-	-	43.24 KB

TestHashValidateEnhanced	****	$2a$1(...)Va/ze [29]	$2a$1(...)k4TCW [60]	165.78 ms	3.175 ms	3.260 ms	165.77 ms	1.00	0.00	1	-	-	-	140.78 KB
TestHashValidateEnhancedPerf1		$2a$1(...)Va/ze [29]	$2a$1(...)k4TCW [60]	168.20 ms	3.313 ms	4.423 ms	168.89 ms	1.01	0.03	1	-	-	-	139.24 KB

TestHashValidateEnhanced	****	$2a$1(...)nIn8u [29]	$2a$1(...)QPlxO [60]	676.66 ms	13.031 ms	18.268 ms	676.95 ms	1.00	0.00	1	-	-	-	524.63 KB
TestHashValidateEnhancedPerf1		$2a$1(...)nIn8u [29]	$2a$1(...)QPlxO [60]	664.51 ms	13.285 ms	15.815 ms	667.74 ms	0.98	0.03	1	-	-	-	525.33 KB

TestHashValidateEnhanced	a	$2a$0(...)5zDGO [29]	$2a$0(...)YVfxe [60]	10.58 ms	0.207 ms	0.261 ms	10.41 ms	1.00	0.00	2	-	-	-	20.7 KB
TestHashValidateEnhancedPerf1	a	$2a$0(...)5zDGO [29]	$2a$0(...)YVfxe [60]	10.22 ms	0.054 ms	0.045 ms	10.22 ms	0.97	0.03	1	-	-	-	19.33 KB

TestHashValidateEnhanced	a	$2a$0(...)2EBfe [29]	$2a$0(...)lC/V. [60]	41.30 ms	0.852 ms	1.249 ms	40.70 ms	1.00	0.00	1	-	-	-	44.7 KB
TestHashValidateEnhancedPerf1	a	$2a$0(...)2EBfe [29]	$2a$0(...)lC/V. [60]	42.45 ms	0.829 ms	1.557 ms	42.36 ms	1.03	0.05	2	-	-	-	43.3 KB

TestHashValidateEnhanced	a	$2a$1(...)/cPi. [29]	$2a$1(...)SQu4u [60]	165.36 ms	3.284 ms	5.395 ms	163.12 ms	1.00	0.00	1	-	-	-	140.7 KB
TestHashValidateEnhancedPerf1	a	$2a$1(...)/cPi. [29]	$2a$1(...)SQu4u [60]	166.88 ms	3.296 ms	4.933 ms	167.31 ms	1.01	0.05	1	-	-	-	139.32 KB

TestHashValidateEnhanced	a	$2a$1(...)BakCe [29]	$2a$1(...)HZpeS [60]	673.68 ms	13.283 ms	18.181 ms	671.11 ms	1.00	0.00	1	-	-	-	524.7 KB
TestHashValidateEnhancedPerf1	a	$2a$1(...)BakCe [29]	$2a$1(...)HZpeS [60]	682.11 ms	13.565 ms	18.569 ms	683.43 ms	1.01	0.05	1	-	-	-	523.3 KB

TestHashValidateEnhanced	abc	$2a$0(...)uDeDu [29]	$2a$0(...)f7h0i [60]	10.59 ms	0.207 ms	0.247 ms	10.46 ms	1.00	0.00	1	-	-	-	20.7 KB
TestHashValidateEnhancedPerf1	abc	$2a$0(...)uDeDu [29]	$2a$0(...)f7h0i [60]	10.63 ms	0.207 ms	0.276 ms	10.52 ms	1.01	0.04	1	-	-	-	19.3 KB

TestHashValidateEnhanced	abc	$2a$0(...)yaM7O [29]	$2a$0(...)LxKcm [60]	41.95 ms	0.981 ms	1.130 ms	41.63 ms	1.00	0.00	1	-	-	-	44.71 KB
TestHashValidateEnhancedPerf1	abc	$2a$0(...)yaM7O [29]	$2a$0(...)LxKcm [60]	42.04 ms	0.835 ms	1.440 ms	41.77 ms	1.01	0.04	1	-	-	-	43.3 KB

TestHashValidateEnhanced	abc	$2a$1(...)7EMR. [29]	$2a$1(...)aSIUi [60]	169.22 ms	3.368 ms	5.041 ms	169.28 ms	1.00	0.00	1	-	-	-	140.71 KB
TestHashValidateEnhancedPerf1	abc	$2a$1(...)7EMR. [29]	$2a$1(...)aSIUi [60]	168.72 ms	3.346 ms	5.403 ms	167.89 ms	1.00	0.04	1	-	-	-	139.3 KB

TestHashValidateEnhanced	abc	$2a$1(...)Situ. [29]	$2a$1(...)Hg.9q [60]	675.28 ms	13.076 ms	20.357 ms	674.19 ms	1.00	0.00	1	-	-	-	524.7 KB
TestHashValidateEnhancedPerf1	abc	$2a$1(...)Situ. [29]	$2a$1(...)Hg.9q [60]	667.30 ms	13.338 ms	17.344 ms	658.82 ms	0.99	0.04	1	-	-	-	523.3 KB

TestHashValidateEnhanced	abcde(...)vwxyz [26]	$2a$0(...)OxvGu [29]	$2a$0(...)QhstC [60]	10.62 ms	0.241 ms	0.322 ms	10.54 ms	1.00	0.00	1	-	-	-	20.84 KB
TestHashValidateEnhancedPerf1	abcde(...)vwxyz [26]	$2a$0(...)OxvGu [29]	$2a$0(...)QhstC [60]	10.41 ms	0.072 ms	0.060 ms	10.38 ms	0.98	0.04	1	-	-	-	19.47 KB

TestHashValidateEnhanced	abcde(...)vwxyz [26]	$2a$0(...)flhge [29]	$2a$0(...)Tvlz. [60]	42.64 ms	0.843 ms	1.454 ms	42.77 ms	1.00	0.00	2	-	-	-	44.84 KB
TestHashValidateEnhancedPerf1	abcde(...)vwxyz [26]	$2a$0(...)flhge [29]	$2a$0(...)Tvlz. [60]	41.50 ms	0.819 ms	1.390 ms	41.02 ms	0.97	0.05	1	-	-	-	43.45 KB

TestHashValidateEnhanced	abcde(...)vwxyz [26]	$2a$1(...)s1e1u [29]	$2a$1(...)Qk7dq [60]	169.47 ms	3.378 ms	4.952 ms	169.73 ms	1.00	0.00	1	-	-	-	140.84 KB
TestHashValidateEnhancedPerf1	abcde(...)vwxyz [26]	$2a$1(...)s1e1u [29]	$2a$1(...)Qk7dq [60]	168.60 ms	3.358 ms	5.882 ms	166.15 ms	1.00	0.05	1	-	-	-	139.45 KB

TestHashValidateEnhanced	abcde(...)vwxyz [26]	$2a$1(...)L7Gpu [29]	$2a$1(...)wJ/pG [60]	664.16 ms	13.250 ms	18.137 ms	664.71 ms	1.00	0.00	1	-	-	-	524.84 KB
TestHashValidateEnhancedPerf1	abcde(...)vwxyz [26]	$2a$1(...)L7Gpu [29]	$2a$1(...)wJ/pG [60]	675.49 ms	13.499 ms	21.411 ms	673.42 ms	1.02	0.04	1	-	-	-	523.45 KB

TestHashValidateEnhanced	~!@#$(...)NBFRD [34]	$2a$0(...)faOI. [29]	$2a$0(...)P6FfO [60]	10.47 ms	0.209 ms	0.319 ms	10.27 ms	1.00	0.00	1	-	-	-	20.88 KB
TestHashValidateEnhancedPerf1	~!@#$(...)NBFRD [34]	$2a$0(...)faOI. [29]	$2a$0(...)P6FfO [60]	10.41 ms	0.270 ms	0.253 ms	10.37 ms	0.99	0.05	1	-	-	-	19.5 KB

TestHashValidateEnhanced	~!@#$(...)NBFRD [34]	$2a$0(...)262hu [29]	$2a$0(...)9UxTW [60]	41.29 ms	1.199 ms	1.333 ms	40.45 ms	1.00	0.00	1	-	-	-	44.88 KB
TestHashValidateEnhancedPerf1	~!@#$(...)NBFRD [34]	$2a$0(...)262hu [29]	$2a$0(...)9UxTW [60]	42.68 ms	0.852 ms	1.579 ms	42.33 ms	1.03	0.04	2	-	-	-	43.49 KB

TestHashValidateEnhanced	~!@#$(...)NBFRD [34]	$2a$1(...)rOvHe [29]	$2a$1(...)JYlfS [60]	167.60 ms	3.328 ms	4.208 ms	166.77 ms	1.00	0.00	1	-	-	-	140.88 KB
TestHashValidateEnhancedPerf1	~!@#$(...)NBFRD [34]	$2a$1(...)rOvHe [29]	$2a$1(...)JYlfS [60]	165.25 ms	3.236 ms	3.726 ms	164.37 ms	0.98	0.04	1	-	-	-	139.49 KB

TestHashValidateEnhanced	~!@#$(...)NBFRD [34]	$2a$1(...)nkrPO [29]	$2a$1(...)eyhgC [60]	671.31 ms	13.073 ms	14.530 ms	666.76 ms	1.00	0.00	1	-	-	-	525.52 KB
TestHashValidateEnhancedPerf1	~!@#$(...)NBFRD [34]	$2a$1(...)nkrPO [29]	$2a$1(...)eyhgC [60]	665.57 ms	13.108 ms	20.017 ms	666.63 ms	0.98	0.03	1	-	-	-	523.68 KB

ChrisMcKee commented 4 years ago


BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18363
Intel Core i7-6800K CPU 3.40GHz (Skylake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=3.1.101
  [Host]     : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
  DefaultJob : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT

Method	key	salt	hash	Mean	Error	StdDev	Ratio	RatioSD	Rank	Gen 0	Gen 1	Gen 2	Allocated
TestHashValidateEnhanced	****	$2a$0(...)4cll. [29]	$2a$0(...)eX1s. [60]	10.76 ms	0.212 ms	0.460 ms	1.00	0.00	1	-	-	-	19.63 KB
TestHashValidateEnhancedPerf1		$2a$0(...)4cll. [29]	$2a$0(...)eX1s. [60]	10.72 ms	0.214 ms	0.452 ms	1.00	0.06	1	-	-	-	18.24 KB

TestHashValidateEnhanced	****	$2a$0(...)Lrgb. [29]	$2a$0(...)uUtye [60]	42.63 ms	0.850 ms	1.919 ms	1.00	0.00	1	-	-	-	43.63 KB
TestHashValidateEnhancedPerf1		$2a$0(...)Lrgb. [29]	$2a$0(...)uUtye [60]	43.24 ms	0.859 ms	1.885 ms	1.02	0.06	1	-	-	-	42.24 KB

TestHashValidateEnhanced	****	$2a$1(...)Va/ze [29]	$2a$1(...)k4TCW [60]	169.35 ms	3.379 ms	6.979 ms	1.00	0.00	1	-	-	-	139.63 KB
TestHashValidateEnhancedPerf1		$2a$1(...)Va/ze [29]	$2a$1(...)k4TCW [60]	169.56 ms	3.338 ms	5.934 ms	1.01	0.06	1	-	-	-	138.24 KB

TestHashValidateEnhanced	****	$2a$1(...)nIn8u [29]	$2a$1(...)QPlxO [60]	682.67 ms	13.592 ms	25.193 ms	1.00	0.00	1	-	-	-	523.63 KB
TestHashValidateEnhancedPerf1		$2a$1(...)nIn8u [29]	$2a$1(...)QPlxO [60]	689.21 ms	13.728 ms	26.120 ms	1.01	0.05	1	-	-	-	522.24 KB

TestHashValidateEnhanced	a	$2a$0(...)5zDGO [29]	$2a$0(...)YVfxe [60]	10.91 ms	0.218 ms	0.404 ms	1.00	0.00	1	-	-	-	19.7 KB
TestHashValidateEnhancedPerf1	a	$2a$0(...)5zDGO [29]	$2a$0(...)YVfxe [60]	10.85 ms	0.214 ms	0.407 ms	0.99	0.05	1	-	-	-	18.3 KB

TestHashValidateEnhanced	a	$2a$0(...)2EBfe [29]	$2a$0(...)lC/V. [60]	42.67 ms	0.842 ms	1.952 ms	1.00	0.00	1	-	-	-	43.7 KB
TestHashValidateEnhancedPerf1	a	$2a$0(...)2EBfe [29]	$2a$0(...)lC/V. [60]	41.91 ms	0.644 ms	0.538 ms	1.01	0.04	1	-	-	-	42.3 KB

TestHashValidateEnhanced	a	$2a$1(...)/cPi. [29]	$2a$1(...)SQu4u [60]	165.28 ms	3.183 ms	4.357 ms	1.00	0.00	1	-	-	-	139.7 KB
TestHashValidateEnhancedPerf1	a	$2a$1(...)/cPi. [29]	$2a$1(...)SQu4u [60]	162.70 ms	2.860 ms	2.676 ms	0.98	0.03	1	-	-	-	138.66 KB

TestHashValidateEnhanced	a	$2a$1(...)BakCe [29]	$2a$1(...)HZpeS [60]	655.93 ms	6.004 ms	4.688 ms	1.00	0.00	1	-	-	-	523.7 KB
TestHashValidateEnhancedPerf1	a	$2a$1(...)BakCe [29]	$2a$1(...)HZpeS [60]	649.06 ms	11.633 ms	10.313 ms	0.99	0.02	1	-	-	-	522.3 KB

TestHashValidateEnhanced	abc	$2a$0(...)uDeDu [29]	$2a$0(...)f7h0i [60]	10.41 ms	0.110 ms	0.103 ms	1.00	0.00	1	-	-	-	19.71 KB
TestHashValidateEnhancedPerf1	abc	$2a$0(...)uDeDu [29]	$2a$0(...)f7h0i [60]	10.51 ms	0.197 ms	0.211 ms	1.01	0.03	1	-	-	-	18.32 KB

TestHashValidateEnhanced	abc	$2a$0(...)yaM7O [29]	$2a$0(...)LxKcm [60]	41.01 ms	0.818 ms	0.725 ms	1.00	0.00	1	-	-	-	43.7 KB
TestHashValidateEnhancedPerf1	abc	$2a$0(...)yaM7O [29]	$2a$0(...)LxKcm [60]	41.49 ms	0.789 ms	0.939 ms	1.01	0.02	1	-	-	-	42.3 KB

TestHashValidateEnhanced	abc	$2a$1(...)7EMR. [29]	$2a$1(...)aSIUi [60]	167.46 ms	1.229 ms	1.150 ms	1.00	0.00	1	-	-	-	139.7 KB
TestHashValidateEnhancedPerf1	abc	$2a$1(...)7EMR. [29]	$2a$1(...)aSIUi [60]	165.89 ms	3.187 ms	3.272 ms	0.99	0.02	1	-	-	-	140.31 KB

TestHashValidateEnhanced	abc	$2a$1(...)Situ. [29]	$2a$1(...)Hg.9q [60]	664.15 ms	8.225 ms	7.291 ms	1.00	0.00	2	-	-	-	523.7 KB
TestHashValidateEnhancedPerf1	abc	$2a$1(...)Situ. [29]	$2a$1(...)Hg.9q [60]	653.74 ms	4.374 ms	4.091 ms	0.98	0.01	1	-	-	-	522.3 KB

TestHashValidateEnhanced	abcde(...)vwxyz [26]	$2a$0(...)OxvGu [29]	$2a$0(...)QhstC [60]	10.40 ms	0.145 ms	0.136 ms	1.00	0.00	1	-	-	-	19.86 KB
TestHashValidateEnhancedPerf1	abcde(...)vwxyz [26]	$2a$0(...)OxvGu [29]	$2a$0(...)QhstC [60]	10.41 ms	0.090 ms	0.084 ms	1.00	0.02	1	-	-	-	18.47 KB

TestHashValidateEnhanced	abcde(...)vwxyz [26]	$2a$0(...)flhge [29]	$2a$0(...)Tvlz. [60]	40.45 ms	0.567 ms	0.503 ms	1.00	0.00	1	-	-	-	43.84 KB
TestHashValidateEnhancedPerf1	abcde(...)vwxyz [26]	$2a$0(...)flhge [29]	$2a$0(...)Tvlz. [60]	40.53 ms	0.220 ms	0.195 ms	1.00	0.01	1	-	-	-	42.45 KB

TestHashValidateEnhanced	abcde(...)vwxyz [26]	$2a$1(...)s1e1u [29]	$2a$1(...)Qk7dq [60]	160.99 ms	1.891 ms	1.676 ms	1.00	0.00	1	-	-	-	140.04 KB
TestHashValidateEnhancedPerf1	abcde(...)vwxyz [26]	$2a$1(...)s1e1u [29]	$2a$1(...)Qk7dq [60]	162.40 ms	3.208 ms	4.057 ms	1.01	0.03	1	-	-	-	138.45 KB

TestHashValidateEnhanced	abcde(...)vwxyz [26]	$2a$1(...)L7Gpu [29]	$2a$1(...)wJ/pG [60]	647.53 ms	3.741 ms	3.500 ms	1.00	0.00	1	-	-	-	525.17 KB
TestHashValidateEnhancedPerf1	abcde(...)vwxyz [26]	$2a$1(...)L7Gpu [29]	$2a$1(...)wJ/pG [60]	658.49 ms	12.966 ms	13.315 ms	1.02	0.02	2	-	-	-	524.45 KB

TestHashValidateEnhanced	~!@#$(...)NBFRD [34]	$2a$0(...)faOI. [29]	$2a$0(...)P6FfO [60]	10.27 ms	0.047 ms	0.044 ms	1.00	0.00	1	-	-	-	19.9 KB
TestHashValidateEnhancedPerf1	~!@#$(...)NBFRD [34]	$2a$0(...)faOI. [29]	$2a$0(...)P6FfO [60]	10.42 ms	0.047 ms	0.041 ms	1.01	0.01	2	-	-	-	18.51 KB

TestHashValidateEnhanced	~!@#$(...)NBFRD [34]	$2a$0(...)262hu [29]	$2a$0(...)9UxTW [60]	40.57 ms	0.268 ms	0.251 ms	1.00	0.00	1	-	-	-	43.88 KB
TestHashValidateEnhancedPerf1	~!@#$(...)NBFRD [34]	$2a$0(...)262hu [29]	$2a$0(...)9UxTW [60]	40.43 ms	0.349 ms	0.292 ms	1.00	0.01	1	-	-	-	42.49 KB

TestHashValidateEnhanced	~!@#$(...)NBFRD [34]	$2a$1(...)rOvHe [29]	$2a$1(...)JYlfS [60]	161.34 ms	1.174 ms	1.098 ms	1.00	0.00	1	-	-	-	140.22 KB
TestHashValidateEnhancedPerf1	~!@#$(...)NBFRD [34]	$2a$1(...)rOvHe [29]	$2a$1(...)JYlfS [60]	164.05 ms	0.746 ms	0.697 ms	1.02	0.01	2	-	-	-	138.82 KB

TestHashValidateEnhanced	~!@#$(...)NBFRD [34]	$2a$1(...)nkrPO [29]	$2a$1(...)eyhgC [60]	646.64 ms	5.192 ms	4.857 ms	1.00	0.00	1	-	-	-	525.22 KB
TestHashValidateEnhancedPerf1	~!@#$(...)NBFRD [34]	$2a$1(...)nkrPO [29]	$2a$1(...)eyhgC [60]	654.55 ms	2.720 ms	2.271 ms	1.01	0.01	1	-	-	-	523.81 KB

ChrisMcKee commented 4 years ago

Had to resort to excel; god I hate R

Mean Time (ns)

Mean Allocation (kb)

results.zip

ChrisMcKee commented 4 years ago

all the non span bits merged into master. I'll hopefully dig around the span bit a bit more.

Thanks for all the back and forth and the PRs; greatly appreciated 😁

jvandertil commented 4 years ago

Awesome, glad to be able to help. The span PR could be done without Span by moving the ‘_lr’ array into a private field and initializing it instead of allocating a new array each iteration. Should give roughly the same order of savings. Not sure if there are any security implications when doing that tho.

Shouldn’t really matter as you can then control when the array is cleared instead of leaving it up to the GC.

ChrisMcKee commented 4 years ago

Definitely going to have a poke around / add docker to the benchmarking to see how much it varies between OS; Between 472/48 and core 2.1/3.1 there's nothing really noticeable. WHich is nice from a predictability standpoint. Alpine + Ubuntu will be the obvious choice in this container crazy world.

Thanks again!

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

ChrisMcKee commented 4 years ago

Closing as its in master; this will go out with the next release

penguinawesome commented 3 years ago

@ChrisMcKee is this included in the 4.0.2 release?

ChrisMcKee commented 3 years ago

Yup

BcryptNet / bcrypt.net

Reduce memory usage #54