Open Vlix opened 4 years ago
Oh? This sounds interesting! Do you have any examples of such libraries? It wouldn't be too much trouble to implement something like this - it would just come down to adding a new encoding and decoding table, and then writing the module api.
I'm working on the password
library, to offer an easy interface for all kinds of password algorithms, and I've noticed there are some implementations (including bcrypt
) that do irregular base64 encoding, I'm still unsure if it's exactly the same as regular base64, except having .
instead of +
and not having padding, there's some sources which seem to say instead of the regular A-Za-z0-9+/
order, it uses ./0-9A-Za-z
, or something...
The passlib
library from Python apparently has some functions that handle this:
Oh and it's not always obvious from the documentation if any decode...
functions can be used for unpadded base64.
In Data.ByteString.Base64
the decodeBase64
seems to allow unpadded input? And it refers to decodeBase64Unpadded
, which is not in the module. In Data.ByteString.Base64.URL
, the decodeBase64
is more explicit in its documentation to allow it. And there are more encode functions that remove the padding. Why not also have those in regular Data.ByteString.Base64
?
In Data.Text.Encoding.Base64
, the decodeBase64Lenient
is the one that seems to allow unpadded input, but decodeBase64
has no such "Note: ..." like the ByteString
variant has, so does it work the same or not?
... Should I maybe make a new issue for this inconsistency in documentation and availability of module functions? (I'd like a Data.ByteString.Base64.encodeBase64Unpadded
for example)
Ah i must have missed some documentation from when I removed unpadded std alphabet base64 support; thank you for bringing that up.
So far, only the URL-safe alphabet is supported by an RFC calling for optionally padded encodings. This is kept that way because it's RFC compliant, and otherwise makes for a confusing API. However, you have decodeBase64Unpadded
precisely because some consumers require unpadded exclusively. I should probably add a decodeBase64Padded
for symmetry - i agree that the lack of that function is slightly confusing.
Because it's important to be spec compliant, I am not willing to do an unpadded version of the std alphabet, but i'm happy to do this with any nonstandard alphabets, since they are not governed by an RFC!
Ah, ok, that sounds reasonable. And on second thought, you're right, there's no instance I'd want to use a Data.ByteString.Base64.encodeBase64Unpadded
. I was thinking of the "Hash64" alphabet, since that's the one that doesn't pad in my use cases.
I hope the examples that the passlib
library give can be used to decipher how this non-RFC style encoding works.
Hmmm, I've been scouring some more, and found it's also called radix-64
encoding, unix/crypt
encoding and some more things. I've found a page of a linux function that should just do what you expect? Maybe?
And this bcrypt source code also has the order of:
"./ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
^ this also has the encodeBase64
function at the bottom (though it's in C)
EDIT: Oh man the Base64 wiki shows that the unix/crypt
encoding is slightly different from the bcrypt
encoding > .<
It also shows there are tons of non-standard encodings... maybe for the library, it'd be best to only add the unix/crypt
variant, as its (probably?) the most used. Or make a module that has different tables, so the user of the API can just rawBytes = decodeBase64Other UnixCrypt "someEncodedString"
where UnixCrypt
is one of the non-standard tables you can give (or a data constructor for an EncodingTable
sum-type that will choose the table for you in the decodeBase64Other
function? I dunno, I'm just brainstorming...
EDIT2: Ok, the main reason I got into this is trying to parse Python's passlib
formats, where its PBKDF2 formats have the .
and no padding... and I tried the following literally in RepLit
from passlib.utils.binary import ab64_encode
val = ab64_encode('>>>')
print val
And it gives "Pj4." sigh which is literally standard base64, but the +
replaced with a .
, so not even by the unix/crypt alphabet or anything... WHY!? \</rant>
Thanks for hunting these down @Vlix. To address some of the issues raised here, I threw together an omnibus PR yesterday for all the things I want to get in before i do this here: https://github.com/emilypi/base64/pull/13
I'm probably going to punt on those TODO's in favor of getting yours in. I just need to sit at my screen on Monday and make sure it all makes sense to me first :)
Yeah, it requires a bit of reading up, but I think it would make sense to have this in the library, even though some of it is somewhat obscure. No rush, I've found the encoding I was looking for isn't even really a different one, so I can just s/./+/
and add padding =
's and keep going 👍
Good luck with the library! And don't hesitate to ask, I'm fairly responsive.
So just to follow up on this, I still do plan on doing this, but i don't want to start until we get Backpack supported in Stack. I'm just repeating the same module over and over, and it's not scalable.
Just for future reference, I've found Passlib's charmaps which show:
A-Za-z0-9+/
)
A-Za-z0-9./
)
s/+/./
./0-9A-Za-z
)
des_crypt
, but is used by md5_crypt
, sha256_crypt
, and others. Within Passlib, this encoding is referred as the “hash64” encoding, to distinguish it from normal base64 and others."./A-Za-z0-9
)
I found that some libraries out there use a slightly different
base64
encoding, namely the "adapted base64 encoding", which is the same as regular base64 encoding, but with the+
replaced by the.
, and having no padding characters. Would be nice (and not too much work, I think?) to add something likeData.ByteString.Base64.Adapted
to handle this type of base64 encoding?