Closed jvandertil closed 4 years ago
You peaked my interest as I'd totally blocked out the stringbuilder usage. Of course some of the codes simply a remanent of it being written for .net 2, and what was 'best' then isn't really best now.
As for spans
I'd purposely avoided them due to the support for this going way back and the expectation that if would be actively replaced within the framework (free-wins) without having to alter the code / add ifdefs.
The diff in perf for adding the extra complexity would have to be meaningful to say the least.
I'd happily take anything around Interrogate hash and PasswordNeedsRehash (see other ticket), especially if they make it more useful / safer / better perf as they're mostly utility methods.
Base
Sized StringBuilder
String interpolation (string format in sheeps clothing)
Using length in EncodeB64 string builder to preallocate (wonky unix version)
Setting DecodeBase64 to use the max bytes as the stringbuilder length. has no noticeable difference
I thought strings create would have more of an impact but tbh not a lot to see here; Or I'm using it wrong; which is entirely likely considering my exposure to Spans been relatively limited.
I have a couple things laying around, should I send the PR's to the perf branch?
I'll definitely take a look at the InterrogateHash method, shouldn't be too hard to improve that. It's not clear to me that the API is settled in the other ticket, is that correct?
@jvandertil sure fire away. I wasn't 100% sure the api in the other ticket was necessarily something that belonged in the library. But making it easier to implement / a better method probably wouldn't go amiss either. The ultimate conflict between trying to keep the api simple and keeping people happy 😆
HashParser Regex to non-regex
BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18363
Intel Core i7-6800K CPU 3.40GHz (Skylake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=3.1.101
[Host] : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
DefaultJob : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
Method | hash | Mean | Error | StdDev | Ratio | Rank | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|---|
InterrogateHashUsingRegex | $2a$1(...)Qk7dq [60] | 2,419.3 ns | 48.15 ns | 88.05 ns | 1.00 | 2 | 0.1678 | - | - | 1328 B |
InterrogateHashUsingParser | $2a$1(...)Qk7dq [60] | 296.0 ns | 6.53 ns | 11.09 ns | 0.12 | 1 | 0.0353 | - | - | 280 B |
InterrogateHashUsingRegex | $2a$1(...)QPlxO [60] | 2,337.5 ns | 46.22 ns | 73.31 ns | 1.00 | 2 | 0.1678 | - | - | 1328 B |
InterrogateHashUsingParser | $2a$1(...)QPlxO [60] | 292.9 ns | 5.74 ns | 8.94 ns | 0.13 | 1 | 0.0353 | - | - | 280 B |
🎈🎆🎇🎈🎉
Base64 Decoding; unsurprisingly not allocating it to string is a winner here.
BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18363
Intel Core i7-6800K CPU 3.40GHz (Skylake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=3.1.101
[Host] : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
DefaultJob : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
Method | salt | Mean | Error | StdDev | Ratio | RatioSD | Rank | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|---|---|
DecodeBase64StandardUnSized | DCq7Y(...)4cll. [22] | 160.47 ns | 3.223 ns | 6.287 ns | 1.00 | 0.00 | 3 | 0.0184 | - | - | 144 B |
DecodeBase64StandardSized | DCq7Y(...)4cll. [22] | 165.65 ns | 3.333 ns | 5.476 ns | 1.03 | 0.05 | 4 | 0.0184 | - | - | 144 B |
DecodeBase64StringCreateSpan | DCq7Y(...)4cll. [22] | 115.95 ns | 2.343 ns | 3.284 ns | 0.73 | 0.03 | 2 | 0.0296 | - | - | 232 B |
DecodeBase64ToBytes | DCq7Y(...)4cll. [22] | 73.71 ns | 1.502 ns | 2.966 ns | 0.46 | 0.02 | 1 | 0.0050 | - | - | 40 B |
DecodeBase64StandardUnSized | HqWuK(...)Lrgb. [22] | 164.26 ns | 3.282 ns | 5.012 ns | 1.00 | 0.00 | 3 | 0.0184 | - | - | 144 B |
DecodeBase64StandardSized | HqWuK(...)Lrgb. [22] | 163.55 ns | 3.252 ns | 6.342 ns | 0.99 | 0.05 | 3 | 0.0184 | - | - | 144 B |
DecodeBase64StringCreateSpan | HqWuK(...)Lrgb. [22] | 114.45 ns | 2.301 ns | 5.051 ns | 0.70 | 0.04 | 2 | 0.0296 | - | - | 232 B |
DecodeBase64ToBytes | HqWuK(...)Lrgb. [22] | 74.16 ns | 1.503 ns | 2.748 ns | 0.45 | 0.02 | 1 | 0.0050 | - | - | 40 B |
Base64Encoding as bytes
BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18363
Intel Core i7-6800K CPU 3.40GHz (Skylake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=3.1.101
[Host] : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
DefaultJob : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
Method | Mean | Error | StdDev | Ratio | RatioSD | Rank | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|---|
EncodeBase64Unsized | 137.70 ns | 2.704 ns | 4.444 ns | 1.00 | 0.00 | 2 | 0.0355 | - | - | 280 B |
EncodeBase64Sized | 149.75 ns | 3.023 ns | 5.967 ns | 1.09 | 0.06 | 3 | 0.0355 | - | - | 280 B |
EncodeBase64AsBytes | 51.95 ns | 1.050 ns | 1.635 ns | 0.38 | 0.01 | 1 | 0.0092 | - | - | 72 B |
Surprisingly pre-sizing had a negative effect; cutting out the extra allocations is a win perf wise.
String Allocation time
BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18363
Intel Core i7-6800K CPU 3.40GHz (Skylake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=3.1.101
[Host] : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
DefaultJob : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
Method | Categories | Mean | Error | StdDev | Ratio | RatioSD | Rank | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|---|---|
Original_StrBuilder_SinEncoding | StringAppend,AppendString | 348.9 ns | 6.91 ns | 10.95 ns | 1.00 | 0.00 | 3 | 0.0648 | - | - | 512 B |
Original_StrBuilder_SinEncoding_AppendChar | StringAppend,AppendChar | 347.9 ns | 6.96 ns | 11.63 ns | 1.00 | 0.04 | 3 | 0.0648 | - | - | 512 B |
Original_StrBuilder_SinEncoding_AppendChar_Sized | StringAppend,AppendChar | 173.3 ns | 3.36 ns | 4.72 ns | 0.50 | 0.02 | 2 | 0.0458 | - | - | 360 B |
Original_StrBuilder_SinEncoding_AppendChar_Sized_PRFmt | StringAppend,AppendChar | 115.4 ns | 1.97 ns | 2.75 ns | 0.33 | 0.01 | 1 | 0.0468 | - | - | 368 B |
Original_StrBuilder_SinEncoding_AppendChar_Sized_FROMSTRING_PRFmt | StringAppend,AppendString | 117.4 ns | 2.37 ns | 3.90 ns | 0.34 | 0.02 | 1 | 0.0468 | - | - | 368 B |
StringInterpolation_WithChar | StringFmt,AppendChar | 384.5 ns | 7.46 ns | 9.96 ns | 1.11 | 0.04 | 4 | 0.0210 | - | - | 168 B |
StringInterpolation_WithString | StringFmt,AppendString | 354.8 ns | 7.06 ns | 7.55 ns | 1.02 | 0.05 | 3 | 0.0281 | - | - | 224 B |
Bit of a mixed bag here; speed wise the original (master) code, sized and appending char was slightly faster than the change in the PR. (the change in encoding to char being the collective winner to enhancing this methods allocations)
Allocation wise the string.format using char was the winner.
I am not sure if this code does what you expect it to do:
[Benchmark]
[BenchmarkCategory("StringFmt", "AppendChar")]
public void StringInterpolation_WithChar()
{
var res = $"$2{bcryptMinorRevision}${workFactor:00}${EncodedSaltAsChars}{EncodedHashAsChars}";
}
The char[]
would be interpolated as "System.Char[]".
As an aside, the benchmark functions should return the generated value, so in the example above:
[Benchmark]
[BenchmarkCategory("StringFmt", "AppendChar")]
public string StringInterpolation_WithChar()
{
return $"$2{bcryptMinorRevision}${workFactor:00}${EncodedSaltAsChars}{EncodedHashAsChars}";
}
It looks like all the benchmarks should return the same value, if so: you could put a [ReturnValueValidator(failOnError: true)]
attribute on the class so that the benchmarks fail if they do not.
Can't allocate memory if you never use the char[] 😆
New string'd it and renamed something and copy pasted the string over the top 😆
Just RTFM'ing the BM docs, its changed a bit in the last ~2 years; the error things handy.
BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18363
Intel Core i7-6800K CPU 3.40GHz (Skylake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=3.1.101
[Host] : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
DefaultJob : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
Method | Categories | Mean | Error | StdDev | Ratio | RatioSD | Rank | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|---|---|
Original_StrBuilder_SinEncoding | StringAppend,AppendString | 343.8 ns | 5.32 ns | 4.72 ns | 1.00 | 0.00 | 3 | 0.0648 | - | - | 512 B |
Original_StrBuilder_SinEncoding_AppendChar | StringAppend,AppendChar | 352.0 ns | 7.06 ns | 9.42 ns | 1.02 | 0.03 | 3 | 0.0648 | - | - | 512 B |
Original_StrBuilder_SinEncoding_AppendChar_Sized | StringAppend,AppendChar | 287.7 ns | 5.76 ns | 10.38 ns | 0.83 | 0.03 | 2 | 0.0458 | - | - | 360 B |
Original_StrBuilder_SinEncoding_AppendChar_Sized_PRFmt | StringAppend,AppendChar | 120.1 ns | 2.45 ns | 4.71 ns | 0.35 | 0.02 | 1 | 0.0467 | - | - | 368 B |
Original_StrBuilder_SinEncoding_AppendChar_Sized_FROMSTRING_PRFmt | StringAppend,AppendString | 117.2 ns | 2.38 ns | 3.63 ns | 0.34 | 0.01 | 1 | 0.0467 | - | - | 368 B |
StringInterpolation_WithChar | StringFmt,AppendChar | 384.3 ns | 7.50 ns | 11.45 ns | 1.11 | 0.03 | 4 | 0.0486 | - | - | 384 B |
StringInterpolation_WithString | StringFmt,AppendString | 341.9 ns | 6.48 ns | 6.65 ns | 0.99 | 0.02 | 3 | 0.0281 | - | - | 224 B |
Makes more sense; more so when you look at what .net is doing
public StringBuilder Append(char[]? value)
{
if (value?.Length > 0)
{
unsafe
{
fixed (char* valueChars = &value[0])
{
Append(valueChars, value.Length);
}
}
}
return this;
}
public StringBuilder Append(ReadOnlySpan<char> value)
{
if (value.Length > 0)
{
unsafe
{
fixed (char* valueChars = &MemoryMarshal.GetReference(value))
{
Append(valueChars, value.Length);
}
}
}
return this;
}
^ noted they've got span in there as well now; the ticket took that long to make it I wasnt sure it was in 3.1
Yeah, I was experimenting a bit with the Base64Encoder, letting it write directly into a Span<char>
that was passed in, which was stack allocated and then put in a string with new string(Span<char>)
😎 . The only heap allocation that GenerateSalt
had after that was the string it had to return. The lowest it can go is 80 bytes (29 chars * 2 bytes (padded to 64 bytes for memory alignment) + 16 bytes object header).
Looks like this:
public static string GenerateSalt(int workFactor, char bcryptMinorRevision = DefaultHashVersion)
{
// Argument checks removed for brevity
#if NETSTANDARD2_1
Span<byte> saltBytes = stackalloc byte[BCryptSaltLen];
Span<char> result = stackalloc char[29];
#else
byte[] saltBytes = new byte[BCryptSaltLen];
char[] result = new char[29];
#endif
RngCsp.GetBytes(saltBytes);
result[0] = '$';
result[1] = '2';
result[2] = bcryptMinorRevision;
result[3] = '$';
result[4] = (char)((workFactor / 10) + '0');
result[5] = (char)((workFactor % 10) + '0');
result[6] = '$';
#if NETSTANDARD2_1
Base64Encoder.EncodeBase64(saltBytes, result.Slice(7));
#else
Base64Encoder.EncodeBase64(saltBytes, saltBytes.Length, result, 7);
#endif
return new string(result);
}
With the Base64 encode and decode methods pulled into their own class. Which looks like this:
public static char[] EncodeBase64(byte[] byteArray, int length)
{
if (length <= 0 || length > byteArray.Length)
{
throw new ArgumentException("Invalid length", nameof(length));
}
int encodedSize = GetEncodedLength(length);
char[] encoded = new char[encodedSize];
#if NETSTANDARD2_1
EncodeBase64(byteArray.AsSpan().Slice(0, length), encoded.AsSpan());
#else
EncodeBase64(byteArray, length, encoded, 0);
#endif
return encoded;
}
#if NETSTANDARD2_1
public static int EncodeBase64(ReadOnlySpan<byte> source, Span<char> destination)
#else
public static int EncodeBase64(byte[] source, int sourceLength, char[] destination, int destinationOffset)
#endif
{
#if NETSTANDARD2_1
int sourceLength = source.Length;
const int destinationOffset = 0;
#endif
int encodedSize = GetEncodedLength(sourceLength);
int requiredCapacity = encodedSize + destinationOffset;
if (destination.Length < requiredCapacity)
{
throw new ArgumentException("Destination too small.");
}
int pos = destinationOffset;
int off = 0;
while (off < sourceLength)
{
// Removed for brevity
}
edit: Add benchmarks
Before (continuing from PR):
Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|
GenerateSalt | 356.0 ns | 2.87 ns | 2.69 ns | 0.1144 | - | - | 360 B |
After:
Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|
GenerateSalt | 235.8 ns | 3.58 ns | 2.80 ns | 0.0253 | - | - | 80 B |
The string builder has some quirks;
Unless you change the data-types going in then there's a limit to what you can achieve anyway (code golf is a tad addictive though). Million ways to skin a cat in .net; but it appears as if the larger the string passed into AppendFormat (and that is quite a big method in .net when you click into it) the slower it gets, regardless of their only being a single parameter. If you switch it to append(string.fmt) you add allocations and time. If you reduce the append fmt to just iformatter:number its faster 😆
var result = new StringBuilder(60);
result.Append("$2")
.Append(bcryptMinorRevision)
.AppendFormat("${0:00}$", workFactor)
.Append(salt)
.Append(hash);
result.Append("$2")
.Append(bcryptMinorRevision)
.Append("$")
.AppendFormat("{0:00}", workFactor)
.Append("$")
.Append(salt)
.Append(hash);
The span types were an awesome move for .net though; we do a lot of flat file processing and the perf difference in slowly bringing these in is great... Go can wipe the floor with it but its definitely better than netcore1 or framework 😉
Yeah, about Span. I've worked it into the main encipher routine (mostly ifdef'ing the signature to have Span
And added stackalloc for the small lr
array using in Key
and EKSKey
. Might make more sense to pull that out into a normal array and not stackallocing, but results are promising.
Before:
Method | text | hash | value | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|---|
VerifyPassword | **** | $2a$1(...)QPlxO [60] | ? | 437,683,033.3 ns | 5,562,464.16 ns | 5,203,132.43 ns | - | - | - | 267400 B |
HashPassword | ? | ? | **** | 6,946,859.0 ns | 63,305.62 ns | 59,216.12 ns | - | - | - | 9698 B |
VerifyPassword | abcde(...)vwxyz [26] | $2a$1(...)Qk7dq [60] | ? | 110,177,241.4 ns | 1,045,349.51 ns | 926,675.11 ns | - | - | - | 71166 B |
HashPassword | ? | ? | abcde(...)vwxyz [26] | 6,997,920.2 ns | 39,497.80 ns | 36,946.26 ns | - | - | - | 9802 B |
After:
Method | text | hash | value | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|---|
VerifyPassword | **** | $2a$1(...)QPlxO [60] | ? | 446,178,420.0 ns | 4,295,215.41 ns | 4,017,747.16 ns | - | - | - | 5176 B |
HashPassword | ? | ? | **** | 7,362,852.0 ns | 72,231.02 ns | 56,393.23 ns | - | - | - | 5522 B |
VerifyPassword | abcde(...)vwxyz [26] | $2a$1(...)Qk7dq [60] | ? | 110,922,204.0 ns | 1,186,127.30 ns | 1,109,504.21 ns | - | - | - | 5554 B |
HashPassword | ? | ? | abcde(...)vwxyz [26] | 7,048,583.9 ns | 66,038.58 ns | 61,772.53 ns | - | - | - | 5627 B |
That does require introducing a dependency on System.Memory
for most platforms. I'm not sure if the constants HAS_SPAN
is the best name, but this shows how I've done it.
<ItemGroup Condition="'$(TargetFramework)' == 'netstandard2.0'
or '$(TargetFramework)' == 'net452'
or '$(TargetFramework)' == 'net462'
or '$(TargetFramework)' == 'net472'">
<PackageReference Include="System.Memory" Version="4.5.3" />
</ItemGroup>
<PropertyGroup Condition="'$(TargetFramework)' == 'netstandard2.1'
or '$(TargetFramework)' == 'netstandard2.0'
or '$(TargetFramework)' == 'net452'
or '$(TargetFramework)' == 'net462'
or '$(TargetFramework)' == 'net472'">
<DefineConstants>$(DefineConstants);HAS_SPAN</DefineConstants>
</PropertyGroup>
The reduction in allocations during verification is interesting if only because on the first pass the allocation is huge 267400
compared to the second test string 71166
.
The EKS ~ areas probably less likely to be merged in, in a hurry, as I'm not sure what the implications are from a security point (I'd have to dig / undoubtedly mither a few people an re-crack open the DPA sln to see if msfts daring to use it for this stuff). Definitely interested in seeing it though.
Opened a PR so you can see the changes. The difference in the allocations is because both hashes have a different workfactor, so you can't really compare those directly.
[Benchmark]
[Arguments("", "$2a$12$k42ZFHFWqBp3vWli.nIn8uYyIkbvYRvodzbfbK18SSsY.CsIQPlxO")]
[Arguments("abcdefghijklmnopqrstuvwxyz", "$2a$10$fVH8e28OQRj9tqiDXs1e1uxpsjN0c7II7YPKXua2NAKYvM6iQk7dq")]
public bool VerifyPassword(string text, string hash)
=> BCrypt.Verify(text, hash);
Fair enough; I hadn't looked at the actual hashes, the fact its a 12 vs 10 explains a lot.
stackalloc should be fine; doing the same old style had issues the new style doesnt have. The use of stackalloc automatically enables buffer overrun detection features in the common language runtime (CLR). If a buffer overrun is detected, the process is terminated as quickly as possible to minimize the chance that malicious code is executed.
which is good.
Span work should in theory be fine.
BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18363
Intel Core i7-6800K CPU 3.40GHz (Skylake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=3.1.101
[Host] : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
DefaultJob : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
Method | key | salt | hash | Mean | Error | StdDev | Median | Ratio | RatioSD | Rank | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
TestHashValidateEnhanced | **** | $2a$0(...)4cll. [29] | $2a$0(...)eX1s. [60] | 10.38 ms | 0.164 ms | 0.154 ms | 10.42 ms | 1.00 | 0.00 | 1 | - | - | - | 20.64 KB |
TestHashValidateEnhancedPerf1 | $2a$0(...)4cll. [29] | $2a$0(...)eX1s. [60] | 10.40 ms | 0.046 ms | 0.043 ms | 10.40 ms | 1.00 | 0.02 | 1 | - | - | - | 19.26 KB | |
TestHashValidateEnhanced | **** | $2a$0(...)Lrgb. [29] | $2a$0(...)uUtye [60] | 40.81 ms | 0.887 ms | 0.786 ms | 40.44 ms | 1.00 | 0.00 | 1 | - | - | - | 44.63 KB |
TestHashValidateEnhancedPerf1 | $2a$0(...)Lrgb. [29] | $2a$0(...)uUtye [60] | 41.55 ms | 0.760 ms | 0.674 ms | 41.59 ms | 1.02 | 0.03 | 2 | - | - | - | 43.24 KB | |
TestHashValidateEnhanced | **** | $2a$1(...)Va/ze [29] | $2a$1(...)k4TCW [60] | 165.78 ms | 3.175 ms | 3.260 ms | 165.77 ms | 1.00 | 0.00 | 1 | - | - | - | 140.78 KB |
TestHashValidateEnhancedPerf1 | $2a$1(...)Va/ze [29] | $2a$1(...)k4TCW [60] | 168.20 ms | 3.313 ms | 4.423 ms | 168.89 ms | 1.01 | 0.03 | 1 | - | - | - | 139.24 KB | |
TestHashValidateEnhanced | **** | $2a$1(...)nIn8u [29] | $2a$1(...)QPlxO [60] | 676.66 ms | 13.031 ms | 18.268 ms | 676.95 ms | 1.00 | 0.00 | 1 | - | - | - | 524.63 KB |
TestHashValidateEnhancedPerf1 | $2a$1(...)nIn8u [29] | $2a$1(...)QPlxO [60] | 664.51 ms | 13.285 ms | 15.815 ms | 667.74 ms | 0.98 | 0.03 | 1 | - | - | - | 525.33 KB | |
TestHashValidateEnhanced | a | $2a$0(...)5zDGO [29] | $2a$0(...)YVfxe [60] | 10.58 ms | 0.207 ms | 0.261 ms | 10.41 ms | 1.00 | 0.00 | 2 | - | - | - | 20.7 KB |
TestHashValidateEnhancedPerf1 | a | $2a$0(...)5zDGO [29] | $2a$0(...)YVfxe [60] | 10.22 ms | 0.054 ms | 0.045 ms | 10.22 ms | 0.97 | 0.03 | 1 | - | - | - | 19.33 KB |
TestHashValidateEnhanced | a | $2a$0(...)2EBfe [29] | $2a$0(...)lC/V. [60] | 41.30 ms | 0.852 ms | 1.249 ms | 40.70 ms | 1.00 | 0.00 | 1 | - | - | - | 44.7 KB |
TestHashValidateEnhancedPerf1 | a | $2a$0(...)2EBfe [29] | $2a$0(...)lC/V. [60] | 42.45 ms | 0.829 ms | 1.557 ms | 42.36 ms | 1.03 | 0.05 | 2 | - | - | - | 43.3 KB |
TestHashValidateEnhanced | a | $2a$1(...)/cPi. [29] | $2a$1(...)SQu4u [60] | 165.36 ms | 3.284 ms | 5.395 ms | 163.12 ms | 1.00 | 0.00 | 1 | - | - | - | 140.7 KB |
TestHashValidateEnhancedPerf1 | a | $2a$1(...)/cPi. [29] | $2a$1(...)SQu4u [60] | 166.88 ms | 3.296 ms | 4.933 ms | 167.31 ms | 1.01 | 0.05 | 1 | - | - | - | 139.32 KB |
TestHashValidateEnhanced | a | $2a$1(...)BakCe [29] | $2a$1(...)HZpeS [60] | 673.68 ms | 13.283 ms | 18.181 ms | 671.11 ms | 1.00 | 0.00 | 1 | - | - | - | 524.7 KB |
TestHashValidateEnhancedPerf1 | a | $2a$1(...)BakCe [29] | $2a$1(...)HZpeS [60] | 682.11 ms | 13.565 ms | 18.569 ms | 683.43 ms | 1.01 | 0.05 | 1 | - | - | - | 523.3 KB |
TestHashValidateEnhanced | abc | $2a$0(...)uDeDu [29] | $2a$0(...)f7h0i [60] | 10.59 ms | 0.207 ms | 0.247 ms | 10.46 ms | 1.00 | 0.00 | 1 | - | - | - | 20.7 KB |
TestHashValidateEnhancedPerf1 | abc | $2a$0(...)uDeDu [29] | $2a$0(...)f7h0i [60] | 10.63 ms | 0.207 ms | 0.276 ms | 10.52 ms | 1.01 | 0.04 | 1 | - | - | - | 19.3 KB |
TestHashValidateEnhanced | abc | $2a$0(...)yaM7O [29] | $2a$0(...)LxKcm [60] | 41.95 ms | 0.981 ms | 1.130 ms | 41.63 ms | 1.00 | 0.00 | 1 | - | - | - | 44.71 KB |
TestHashValidateEnhancedPerf1 | abc | $2a$0(...)yaM7O [29] | $2a$0(...)LxKcm [60] | 42.04 ms | 0.835 ms | 1.440 ms | 41.77 ms | 1.01 | 0.04 | 1 | - | - | - | 43.3 KB |
TestHashValidateEnhanced | abc | $2a$1(...)7EMR. [29] | $2a$1(...)aSIUi [60] | 169.22 ms | 3.368 ms | 5.041 ms | 169.28 ms | 1.00 | 0.00 | 1 | - | - | - | 140.71 KB |
TestHashValidateEnhancedPerf1 | abc | $2a$1(...)7EMR. [29] | $2a$1(...)aSIUi [60] | 168.72 ms | 3.346 ms | 5.403 ms | 167.89 ms | 1.00 | 0.04 | 1 | - | - | - | 139.3 KB |
TestHashValidateEnhanced | abc | $2a$1(...)Situ. [29] | $2a$1(...)Hg.9q [60] | 675.28 ms | 13.076 ms | 20.357 ms | 674.19 ms | 1.00 | 0.00 | 1 | - | - | - | 524.7 KB |
TestHashValidateEnhancedPerf1 | abc | $2a$1(...)Situ. [29] | $2a$1(...)Hg.9q [60] | 667.30 ms | 13.338 ms | 17.344 ms | 658.82 ms | 0.99 | 0.04 | 1 | - | - | - | 523.3 KB |
TestHashValidateEnhanced | abcde(...)vwxyz [26] | $2a$0(...)OxvGu [29] | $2a$0(...)QhstC [60] | 10.62 ms | 0.241 ms | 0.322 ms | 10.54 ms | 1.00 | 0.00 | 1 | - | - | - | 20.84 KB |
TestHashValidateEnhancedPerf1 | abcde(...)vwxyz [26] | $2a$0(...)OxvGu [29] | $2a$0(...)QhstC [60] | 10.41 ms | 0.072 ms | 0.060 ms | 10.38 ms | 0.98 | 0.04 | 1 | - | - | - | 19.47 KB |
TestHashValidateEnhanced | abcde(...)vwxyz [26] | $2a$0(...)flhge [29] | $2a$0(...)Tvlz. [60] | 42.64 ms | 0.843 ms | 1.454 ms | 42.77 ms | 1.00 | 0.00 | 2 | - | - | - | 44.84 KB |
TestHashValidateEnhancedPerf1 | abcde(...)vwxyz [26] | $2a$0(...)flhge [29] | $2a$0(...)Tvlz. [60] | 41.50 ms | 0.819 ms | 1.390 ms | 41.02 ms | 0.97 | 0.05 | 1 | - | - | - | 43.45 KB |
TestHashValidateEnhanced | abcde(...)vwxyz [26] | $2a$1(...)s1e1u [29] | $2a$1(...)Qk7dq [60] | 169.47 ms | 3.378 ms | 4.952 ms | 169.73 ms | 1.00 | 0.00 | 1 | - | - | - | 140.84 KB |
TestHashValidateEnhancedPerf1 | abcde(...)vwxyz [26] | $2a$1(...)s1e1u [29] | $2a$1(...)Qk7dq [60] | 168.60 ms | 3.358 ms | 5.882 ms | 166.15 ms | 1.00 | 0.05 | 1 | - | - | - | 139.45 KB |
TestHashValidateEnhanced | abcde(...)vwxyz [26] | $2a$1(...)L7Gpu [29] | $2a$1(...)wJ/pG [60] | 664.16 ms | 13.250 ms | 18.137 ms | 664.71 ms | 1.00 | 0.00 | 1 | - | - | - | 524.84 KB |
TestHashValidateEnhancedPerf1 | abcde(...)vwxyz [26] | $2a$1(...)L7Gpu [29] | $2a$1(...)wJ/pG [60] | 675.49 ms | 13.499 ms | 21.411 ms | 673.42 ms | 1.02 | 0.04 | 1 | - | - | - | 523.45 KB |
TestHashValidateEnhanced | ~!@#$(...)NBFRD [34] | $2a$0(...)faOI. [29] | $2a$0(...)P6FfO [60] | 10.47 ms | 0.209 ms | 0.319 ms | 10.27 ms | 1.00 | 0.00 | 1 | - | - | - | 20.88 KB |
TestHashValidateEnhancedPerf1 | ~!@#$(...)NBFRD [34] | $2a$0(...)faOI. [29] | $2a$0(...)P6FfO [60] | 10.41 ms | 0.270 ms | 0.253 ms | 10.37 ms | 0.99 | 0.05 | 1 | - | - | - | 19.5 KB |
TestHashValidateEnhanced | ~!@#$(...)NBFRD [34] | $2a$0(...)262hu [29] | $2a$0(...)9UxTW [60] | 41.29 ms | 1.199 ms | 1.333 ms | 40.45 ms | 1.00 | 0.00 | 1 | - | - | - | 44.88 KB |
TestHashValidateEnhancedPerf1 | ~!@#$(...)NBFRD [34] | $2a$0(...)262hu [29] | $2a$0(...)9UxTW [60] | 42.68 ms | 0.852 ms | 1.579 ms | 42.33 ms | 1.03 | 0.04 | 2 | - | - | - | 43.49 KB |
TestHashValidateEnhanced | ~!@#$(...)NBFRD [34] | $2a$1(...)rOvHe [29] | $2a$1(...)JYlfS [60] | 167.60 ms | 3.328 ms | 4.208 ms | 166.77 ms | 1.00 | 0.00 | 1 | - | - | - | 140.88 KB |
TestHashValidateEnhancedPerf1 | ~!@#$(...)NBFRD [34] | $2a$1(...)rOvHe [29] | $2a$1(...)JYlfS [60] | 165.25 ms | 3.236 ms | 3.726 ms | 164.37 ms | 0.98 | 0.04 | 1 | - | - | - | 139.49 KB |
TestHashValidateEnhanced | ~!@#$(...)NBFRD [34] | $2a$1(...)nkrPO [29] | $2a$1(...)eyhgC [60] | 671.31 ms | 13.073 ms | 14.530 ms | 666.76 ms | 1.00 | 0.00 | 1 | - | - | - | 525.52 KB |
TestHashValidateEnhancedPerf1 | ~!@#$(...)NBFRD [34] | $2a$1(...)nkrPO [29] | $2a$1(...)eyhgC [60] | 665.57 ms | 13.108 ms | 20.017 ms | 666.63 ms | 0.98 | 0.03 | 1 | - | - | - | 523.68 KB |
BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18363
Intel Core i7-6800K CPU 3.40GHz (Skylake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=3.1.101
[Host] : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
DefaultJob : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
Method | key | salt | hash | Mean | Error | StdDev | Ratio | RatioSD | Rank | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
TestHashValidateEnhanced | **** | $2a$0(...)4cll. [29] | $2a$0(...)eX1s. [60] | 10.76 ms | 0.212 ms | 0.460 ms | 1.00 | 0.00 | 1 | - | - | - | 19.63 KB |
TestHashValidateEnhancedPerf1 | $2a$0(...)4cll. [29] | $2a$0(...)eX1s. [60] | 10.72 ms | 0.214 ms | 0.452 ms | 1.00 | 0.06 | 1 | - | - | - | 18.24 KB | |
TestHashValidateEnhanced | **** | $2a$0(...)Lrgb. [29] | $2a$0(...)uUtye [60] | 42.63 ms | 0.850 ms | 1.919 ms | 1.00 | 0.00 | 1 | - | - | - | 43.63 KB |
TestHashValidateEnhancedPerf1 | $2a$0(...)Lrgb. [29] | $2a$0(...)uUtye [60] | 43.24 ms | 0.859 ms | 1.885 ms | 1.02 | 0.06 | 1 | - | - | - | 42.24 KB | |
TestHashValidateEnhanced | **** | $2a$1(...)Va/ze [29] | $2a$1(...)k4TCW [60] | 169.35 ms | 3.379 ms | 6.979 ms | 1.00 | 0.00 | 1 | - | - | - | 139.63 KB |
TestHashValidateEnhancedPerf1 | $2a$1(...)Va/ze [29] | $2a$1(...)k4TCW [60] | 169.56 ms | 3.338 ms | 5.934 ms | 1.01 | 0.06 | 1 | - | - | - | 138.24 KB | |
TestHashValidateEnhanced | **** | $2a$1(...)nIn8u [29] | $2a$1(...)QPlxO [60] | 682.67 ms | 13.592 ms | 25.193 ms | 1.00 | 0.00 | 1 | - | - | - | 523.63 KB |
TestHashValidateEnhancedPerf1 | $2a$1(...)nIn8u [29] | $2a$1(...)QPlxO [60] | 689.21 ms | 13.728 ms | 26.120 ms | 1.01 | 0.05 | 1 | - | - | - | 522.24 KB | |
TestHashValidateEnhanced | a | $2a$0(...)5zDGO [29] | $2a$0(...)YVfxe [60] | 10.91 ms | 0.218 ms | 0.404 ms | 1.00 | 0.00 | 1 | - | - | - | 19.7 KB |
TestHashValidateEnhancedPerf1 | a | $2a$0(...)5zDGO [29] | $2a$0(...)YVfxe [60] | 10.85 ms | 0.214 ms | 0.407 ms | 0.99 | 0.05 | 1 | - | - | - | 18.3 KB |
TestHashValidateEnhanced | a | $2a$0(...)2EBfe [29] | $2a$0(...)lC/V. [60] | 42.67 ms | 0.842 ms | 1.952 ms | 1.00 | 0.00 | 1 | - | - | - | 43.7 KB |
TestHashValidateEnhancedPerf1 | a | $2a$0(...)2EBfe [29] | $2a$0(...)lC/V. [60] | 41.91 ms | 0.644 ms | 0.538 ms | 1.01 | 0.04 | 1 | - | - | - | 42.3 KB |
TestHashValidateEnhanced | a | $2a$1(...)/cPi. [29] | $2a$1(...)SQu4u [60] | 165.28 ms | 3.183 ms | 4.357 ms | 1.00 | 0.00 | 1 | - | - | - | 139.7 KB |
TestHashValidateEnhancedPerf1 | a | $2a$1(...)/cPi. [29] | $2a$1(...)SQu4u [60] | 162.70 ms | 2.860 ms | 2.676 ms | 0.98 | 0.03 | 1 | - | - | - | 138.66 KB |
TestHashValidateEnhanced | a | $2a$1(...)BakCe [29] | $2a$1(...)HZpeS [60] | 655.93 ms | 6.004 ms | 4.688 ms | 1.00 | 0.00 | 1 | - | - | - | 523.7 KB |
TestHashValidateEnhancedPerf1 | a | $2a$1(...)BakCe [29] | $2a$1(...)HZpeS [60] | 649.06 ms | 11.633 ms | 10.313 ms | 0.99 | 0.02 | 1 | - | - | - | 522.3 KB |
TestHashValidateEnhanced | abc | $2a$0(...)uDeDu [29] | $2a$0(...)f7h0i [60] | 10.41 ms | 0.110 ms | 0.103 ms | 1.00 | 0.00 | 1 | - | - | - | 19.71 KB |
TestHashValidateEnhancedPerf1 | abc | $2a$0(...)uDeDu [29] | $2a$0(...)f7h0i [60] | 10.51 ms | 0.197 ms | 0.211 ms | 1.01 | 0.03 | 1 | - | - | - | 18.32 KB |
TestHashValidateEnhanced | abc | $2a$0(...)yaM7O [29] | $2a$0(...)LxKcm [60] | 41.01 ms | 0.818 ms | 0.725 ms | 1.00 | 0.00 | 1 | - | - | - | 43.7 KB |
TestHashValidateEnhancedPerf1 | abc | $2a$0(...)yaM7O [29] | $2a$0(...)LxKcm [60] | 41.49 ms | 0.789 ms | 0.939 ms | 1.01 | 0.02 | 1 | - | - | - | 42.3 KB |
TestHashValidateEnhanced | abc | $2a$1(...)7EMR. [29] | $2a$1(...)aSIUi [60] | 167.46 ms | 1.229 ms | 1.150 ms | 1.00 | 0.00 | 1 | - | - | - | 139.7 KB |
TestHashValidateEnhancedPerf1 | abc | $2a$1(...)7EMR. [29] | $2a$1(...)aSIUi [60] | 165.89 ms | 3.187 ms | 3.272 ms | 0.99 | 0.02 | 1 | - | - | - | 140.31 KB |
TestHashValidateEnhanced | abc | $2a$1(...)Situ. [29] | $2a$1(...)Hg.9q [60] | 664.15 ms | 8.225 ms | 7.291 ms | 1.00 | 0.00 | 2 | - | - | - | 523.7 KB |
TestHashValidateEnhancedPerf1 | abc | $2a$1(...)Situ. [29] | $2a$1(...)Hg.9q [60] | 653.74 ms | 4.374 ms | 4.091 ms | 0.98 | 0.01 | 1 | - | - | - | 522.3 KB |
TestHashValidateEnhanced | abcde(...)vwxyz [26] | $2a$0(...)OxvGu [29] | $2a$0(...)QhstC [60] | 10.40 ms | 0.145 ms | 0.136 ms | 1.00 | 0.00 | 1 | - | - | - | 19.86 KB |
TestHashValidateEnhancedPerf1 | abcde(...)vwxyz [26] | $2a$0(...)OxvGu [29] | $2a$0(...)QhstC [60] | 10.41 ms | 0.090 ms | 0.084 ms | 1.00 | 0.02 | 1 | - | - | - | 18.47 KB |
TestHashValidateEnhanced | abcde(...)vwxyz [26] | $2a$0(...)flhge [29] | $2a$0(...)Tvlz. [60] | 40.45 ms | 0.567 ms | 0.503 ms | 1.00 | 0.00 | 1 | - | - | - | 43.84 KB |
TestHashValidateEnhancedPerf1 | abcde(...)vwxyz [26] | $2a$0(...)flhge [29] | $2a$0(...)Tvlz. [60] | 40.53 ms | 0.220 ms | 0.195 ms | 1.00 | 0.01 | 1 | - | - | - | 42.45 KB |
TestHashValidateEnhanced | abcde(...)vwxyz [26] | $2a$1(...)s1e1u [29] | $2a$1(...)Qk7dq [60] | 160.99 ms | 1.891 ms | 1.676 ms | 1.00 | 0.00 | 1 | - | - | - | 140.04 KB |
TestHashValidateEnhancedPerf1 | abcde(...)vwxyz [26] | $2a$1(...)s1e1u [29] | $2a$1(...)Qk7dq [60] | 162.40 ms | 3.208 ms | 4.057 ms | 1.01 | 0.03 | 1 | - | - | - | 138.45 KB |
TestHashValidateEnhanced | abcde(...)vwxyz [26] | $2a$1(...)L7Gpu [29] | $2a$1(...)wJ/pG [60] | 647.53 ms | 3.741 ms | 3.500 ms | 1.00 | 0.00 | 1 | - | - | - | 525.17 KB |
TestHashValidateEnhancedPerf1 | abcde(...)vwxyz [26] | $2a$1(...)L7Gpu [29] | $2a$1(...)wJ/pG [60] | 658.49 ms | 12.966 ms | 13.315 ms | 1.02 | 0.02 | 2 | - | - | - | 524.45 KB |
TestHashValidateEnhanced | ~!@#$(...)NBFRD [34] | $2a$0(...)faOI. [29] | $2a$0(...)P6FfO [60] | 10.27 ms | 0.047 ms | 0.044 ms | 1.00 | 0.00 | 1 | - | - | - | 19.9 KB |
TestHashValidateEnhancedPerf1 | ~!@#$(...)NBFRD [34] | $2a$0(...)faOI. [29] | $2a$0(...)P6FfO [60] | 10.42 ms | 0.047 ms | 0.041 ms | 1.01 | 0.01 | 2 | - | - | - | 18.51 KB |
TestHashValidateEnhanced | ~!@#$(...)NBFRD [34] | $2a$0(...)262hu [29] | $2a$0(...)9UxTW [60] | 40.57 ms | 0.268 ms | 0.251 ms | 1.00 | 0.00 | 1 | - | - | - | 43.88 KB |
TestHashValidateEnhancedPerf1 | ~!@#$(...)NBFRD [34] | $2a$0(...)262hu [29] | $2a$0(...)9UxTW [60] | 40.43 ms | 0.349 ms | 0.292 ms | 1.00 | 0.01 | 1 | - | - | - | 42.49 KB |
TestHashValidateEnhanced | ~!@#$(...)NBFRD [34] | $2a$1(...)rOvHe [29] | $2a$1(...)JYlfS [60] | 161.34 ms | 1.174 ms | 1.098 ms | 1.00 | 0.00 | 1 | - | - | - | 140.22 KB |
TestHashValidateEnhancedPerf1 | ~!@#$(...)NBFRD [34] | $2a$1(...)rOvHe [29] | $2a$1(...)JYlfS [60] | 164.05 ms | 0.746 ms | 0.697 ms | 1.02 | 0.01 | 2 | - | - | - | 138.82 KB |
TestHashValidateEnhanced | ~!@#$(...)NBFRD [34] | $2a$1(...)nkrPO [29] | $2a$1(...)eyhgC [60] | 646.64 ms | 5.192 ms | 4.857 ms | 1.00 | 0.00 | 1 | - | - | - | 525.22 KB |
TestHashValidateEnhancedPerf1 | ~!@#$(...)NBFRD [34] | $2a$1(...)nkrPO [29] | $2a$1(...)eyhgC [60] | 654.55 ms | 2.720 ms | 2.271 ms | 1.01 | 0.01 | 1 | - | - | - | 523.81 KB |
all the non span bits merged into master. I'll hopefully dig around the span bit a bit more.
Thanks for all the back and forth and the PRs; greatly appreciated 😁
Awesome, glad to be able to help. The span PR could be done without Span by moving the ‘_lr’ array into a private field and initializing it instead of allocating a new array each iteration. Should give roughly the same order of savings. Not sure if there are any security implications when doing that tho.
Shouldn’t really matter as you can then control when the array is cleared instead of leaving it up to the GC.
Definitely going to have a poke around / add docker to the benchmarking to see how much it varies between OS; Between 472/48 and core 2.1/3.1 there's nothing really noticeable. WHich is nice from a predictability standpoint. Alpine + Ubuntu will be the obvious choice in this container crazy world.
Thanks again!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Closing as its in master; this will go out with the next release
@ChrisMcKee is this included in the 4.0.2 release?
Yup
I was looking through the source code of the library, and noticed that there is quite some room to reduce the amount of memory allocated for certain operations.
I ran benchmarks (source code attached) for the (I think) most commonly used public methods, and came up with these results.
Obviously a large part of the runtime and memory usage is inherent to the algorithm itself, but I do believe that some optimizations might be useful to reduce unnecessary allocations.
Things I noticed at first glance:
PasswordNeedsRehash
could be added to improve this scenario. A simple custom parser could improve this significantly.Are you interested in taking PR's for this?
Update: Aligned benchmarks to all use workfactor 6 hashes