Open adamsitnik opened 5 years ago
The bulk of the cost of GetRandomFileName is presumably in getting the randomness, which is also the only part of the code that's platform-specific and differs between platforms. https://github.com/dotnet/coreclr/blob/master/src/System.Private.CoreLib/shared/Interop/Unix/System.Native/Interop.GetRandomBytes.cs https://github.com/dotnet/coreclr/blob/master/src/System.Private.CoreLib/shared/Interop/Windows/BCrypt/Interop.BCryptGenRandom.GetRandomBytes.cs I'm not sure how much that can realistically be improved. Maybe https://github.com/dotnet/corefx/pull/37906 will help.
An alternative to OS based random could be managed implementation of http://vigna.di.unimi.it/ftp/papers/xorshift.pdf, based on George Marsaglia's simple xor-shift pseudo random. e.g. https://github.com/bewaretech/PEA.NET/blob/40f851db/src/PEA/PEA/Util/FastRandom.cs#L149, which claims:
/// 2) Faster than System.Random. Up to 8x faster, depending on which methods are called.
/cc @lpereira
There's some work improving it already in #1433. It should improve performance ever so slightly (as it doesn't always XOR with the result of srand()
-- it only does so when it detects that the OS method didn't produce good enough numbers according to a simple heuristic).
Since that function doesn't necessarily needs cryptographically-secure numbers, I can use the current method (that requires a syscall on Linux) just to seed a RNG that's entirely in userspace. Probably something in the family of xorshift, as it requires very little state and is, for many non-cryptographic purposes, good enough. Will do a follow-up patch implementing this.
From the docs: https://docs.microsoft.com/en-us/dotnet/api/system.io.path.getrandomfilename?view=netcore-3.1 "The GetRandomFileName method returns a cryptographically strong, random string that can be used as either a folder name or a file name."
Ah. GetRandomBytes()
isn't supposed to generate cryptographically secure numbers, so maybe GetRandomFileName()
shouldn't be using it in the first place?
Since GetRandomBytes()
is internal, maybe it could be renamed to GetEntropyForSeeding()
, and a PRNG using something from System.Security.Cryptography
(AesCcm
as it's already implemented, or something like ChaCha20
, which is used by OpenSSH-portable to implement arc4random()
, or even RandomNumberGenerator
) could be used instead?
For places where crypto-safe numbers aren't needed, GetEntropyForSeeding()
can be used to kickstart something fast-but-not-crypto-safe.
This way, we clear up this confusion, keep in the managed code for longer, and go down to the native side of things when needed.
so maybe GetRandomFileName() shouldn't be using it in the first place?
Probably.
The output of Path.GetRandomFileName
has 8 bytes in it. I believe that it is misleading to say that it returns cryptographically strong string. It depends on how you are using this string. I think we should remove the cryptographically strong part from the docs.
GetRandomBytes()
isn't supposed to generate cryptographically secure numbers
FWIW, here is a discussion how we ended up with this: https://github.com/dotnet/corefx/pull/16652 . The output of it are cryptographically strong bytes (for some definitions of cryptographically strong) on all platforms we care about.
Triage: We should be clearer in the docs around using the terms "cryptographically strong" (probably remove it). Changing the the heuristics here should be done with caution as it will potentially cause undesirable consequences. If we can be confident in the distribution of names, and we don't change the set of characters that are used, than changing this would probably be fine.
What's wrong with reading from /dev/urandom as a seed?
Note: unix is currently (in main
) using non-crypto random and windows using crypto random, while the proposed managed implementation is non-crypto random.
Windows:
// Summary
BenchmarkDotNet v0.14.0, Windows 11 (10.0.22631.4169/23H2/2023Update/SunValley3) 13th Gen Intel Core i7-1365U, 1 CPU, 12 logical and 10 physical cores .NET SDK 8.0.400 [Host] : .NET 9.0.0 (9.0.24.46307), X64 RyuJIT AVX2 Job-XTXYQT : .NET 9.0.0 (9.0.24.46307), X64 RyuJIT AVX2
Toolchain=.NET 9.0
Method | Mean | Error | StdDev | Allocated |
---|---|---|---|---|
GetRandomFileName | 45.42 ns | 0.265 ns | 0.235 ns | 48 B |
GetRandomFileName2 | 18.11 ns | 0.202 ns | 0.189 ns | 48 B |
Linux (same machine using WSL):
// Summary
BenchmarkDotNet v0.14.0, Ubuntu 24.04 LTS (Noble Numbat) WSL 13th Gen Intel Core i7-1365U, 1 CPU, 12 logical and 6 physical cores .NET SDK 8.0.108 [Host] : .NET 9.0.0 (9.0.24.46307), X64 RyuJIT AVX2 Job-LAVSVF : .NET 9.0.0 (9.0.24.46307), X64 RyuJIT AVX2
Toolchain=.NET 9.0
Method | Mean | Error | StdDev | Median | Allocated |
---|---|---|---|---|---|
GetRandomFileName | 581.59 ns | 7.980 ns | 18.653 ns | 574.64 ns | 48 B |
GetRandomFileName2 | 23.60 ns | 0.926 ns | 2.732 ns | 24.66 ns | 48 B |
Mac:
// Summary
BenchmarkDotNet v0.14.0, macOS Sonoma 14.6.1 (23G93) [Darwin 23.6.0] Apple M1 Pro, 1 CPU, 10 logical and 10 physical cores .NET SDK 8.0.401 [Host] : .NET 9.0.0 (9.0.24.46307), Arm64 RyuJIT AdvSIMD Job-SESMBK : .NET 9.0.0 (9.0.24.46307), Arm64 RyuJIT AdvSIMD
Toolchain=.NET 9.0
Method | Mean | Error | StdDev | Allocated |
---|---|---|---|---|
GetRandomFileName | 47.82 ns | 0.210 ns | 0.164 ns | 48 B |
GetRandomFileName2 | 16.74 ns | 0.188 ns | 0.157 ns | 48 B |
It can be further improved (vectorization, light-weight or no locking etc.), but is the implementation heading in the right direction?
Path.GetRandomFileName()
is 7 times slower on Linux compared to Windows.The contributor who wants to work on this issue should: