HalosGhost / pandabin

A self-hostable, simple and fast pastebin written in C
GNU General Public License v3.0
3 stars 0 forks source link

base64url encoded hashes #14

Open HalosGhost opened 7 years ago

HalosGhost commented 7 years ago

sha256 for the paste id means the likelihood of collission is incredibly low; but having to deal with a 64-character id is kind of a pain.

if we encoded the hash digest in base58 insead of hex, we could still guarantee a safe URL (unlike base64) but shorten the id by quite a bit (experimentally, it looks like it would be ¾ as long; roughly 48 characters). Not great, but an improvement.

This also has the benefit of dramatically increasing the directory load-balancing (from 4096 vs 195112) without changing that code at all.

buhman commented 6 years ago

How is base64 unsafe? There is literally a thing called url safe base64.

HalosGhost commented 6 years ago

That's fascinating! I was not aware of base64url, that may be a much better option than base58. Thanks!

HalosGhost commented 6 years ago

getting base64url is quite simple; do the normal base64 encoding, then replace + with - and / with _. Converting back is just the opposite; do the reverse replacement and then do the decode.

For our case, since we are only ever doing base64 encoding for hashes, we have completely predictable sizes and the replacement (at least, at first) can reasonably be done as a standard in-place loop. Future optimization may push us in the direction of including libb64 internally and replacing the two characters so we don't need to do the replacement at all.

With base64 instead of base58, we get a reduction closer to ⅓ rather than ¼ (44 characters rather than 48). Still pretty unruly for a human to just remember, but we're headed in the right direction.