Closed madelson closed 2 years ago
oh that's a dumb oversight on me, I correctly handled it for Md5 but forgot to do the same for NameAndMd5Mix >_> but it might be simplier and safer to use hex enconding like you said, wouldn't that produce file longer hash though?
@Doraku yes the solution you linked there should work and fixes the nesting problem. However, doesn't ?
have special meaning in URLs (starts the query string?). I could see this causing issues depending on where the docs are hosted.
There is also still the case-sensitivity problem although I suspect that the risk of an actual collision there is pretty low, comparable to knocking a couple bytes off the hash. Unifying forward- and back- slash as ?
similarly increases collision odds.
Hex will lead to hashes that are a bit longer (32 chars vs. 24), so maybe that's a concern. For my use-case it would not be.
Another option would be to use a custom alphabet for the encoding, for example all upper-case letters and digits (36 chars):
Encode(md5, "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ");
static string Encode(byte[] hash, ReadOnlySpan<char> alphabet)
{
var bi = new BigInteger(hash.Concat(new byte[] { 0 }).ToArray());
var result = new StringBuilder();
while (bi != 0)
{
bi = BigInteger.DivRem(bi, alphabet.Length, out var remainder);
result.Append(alphabet[(int)remainder]);
}
if (result.Length == 0) { result.Append(alphabet[0]); }
return result.ToString();
}
This gives hash strings of 25 chars or occasionally less if the hash has enough trailing zero bits (a padding solution could be added to guarantee constant length if desired). The nice thing about these hashes is that they only use very "safe" characters and are case-insensitive.
that's would be actually pretty cool (and safe) :)
Thanks for creating this library! I'm trying to use this it to generate API documentation for my projects. I'm trying to use the
NameAndMd5Mix
mode to avoid long path issues I'm seeing with the default naming scheme.The problem is that the MD5 hashes are encoded with base 64, which can contain the
/
character. This causes files to end up in nested folders (e.g. see this file). This in turn breaks all relative links in the nested files (e.g. see the namespace link here).I think an easy fix would be to use hex encoding rather than base 64. This has the added advantage of being case-insensitive which tends to be better for URLs.
If you're interested, I'd be happy to submit a PR.