emilypi / base64

RFC 4648-compliant Base64
BSD 3-Clause "New" or "Revised" License
33 stars 10 forks source link

[Request] De-/Encoding of adapted base64 #12

Open Vlix opened 4 years ago

Vlix commented 4 years ago

I found that some libraries out there use a slightly different base64 encoding, namely the "adapted base64 encoding", which is the same as regular base64 encoding, but with the + replaced by the ., and having no padding characters. Would be nice (and not too much work, I think?) to add something like Data.ByteString.Base64.Adapted to handle this type of base64 encoding?

emilypi commented 4 years ago

Oh? This sounds interesting! Do you have any examples of such libraries? It wouldn't be too much trouble to implement something like this - it would just come down to adding a new encoding and decoding table, and then writing the module api.

Vlix commented 4 years ago

I'm working on the password library, to offer an easy interface for all kinds of password algorithms, and I've noticed there are some implementations (including bcrypt) that do irregular base64 encoding, I'm still unsure if it's exactly the same as regular base64, except having . instead of + and not having padding, there's some sources which seem to say instead of the regular A-Za-z0-9+/ order, it uses ./0-9A-Za-z, or something...

The passlib library from Python apparently has some functions that handle this:

Vlix commented 4 years ago

Oh and it's not always obvious from the documentation if any decode... functions can be used for unpadded base64. In Data.ByteString.Base64 the decodeBase64 seems to allow unpadded input? And it refers to decodeBase64Unpadded, which is not in the module. In Data.ByteString.Base64.URL, the decodeBase64 is more explicit in its documentation to allow it. And there are more encode functions that remove the padding. Why not also have those in regular Data.ByteString.Base64? In Data.Text.Encoding.Base64, the decodeBase64Lenient is the one that seems to allow unpadded input, but decodeBase64 has no such "Note: ..." like the ByteString variant has, so does it work the same or not?

... Should I maybe make a new issue for this inconsistency in documentation and availability of module functions? (I'd like a Data.ByteString.Base64.encodeBase64Unpadded for example)

emilypi commented 4 years ago

Ah i must have missed some documentation from when I removed unpadded std alphabet base64 support; thank you for bringing that up.

So far, only the URL-safe alphabet is supported by an RFC calling for optionally padded encodings. This is kept that way because it's RFC compliant, and otherwise makes for a confusing API. However, you have decodeBase64Unpadded precisely because some consumers require unpadded exclusively. I should probably add a decodeBase64Padded for symmetry - i agree that the lack of that function is slightly confusing.

Because it's important to be spec compliant, I am not willing to do an unpadded version of the std alphabet, but i'm happy to do this with any nonstandard alphabets, since they are not governed by an RFC!

Vlix commented 4 years ago

Ah, ok, that sounds reasonable. And on second thought, you're right, there's no instance I'd want to use a Data.ByteString.Base64.encodeBase64Unpadded. I was thinking of the "Hash64" alphabet, since that's the one that doesn't pad in my use cases.

I hope the examples that the passlib library give can be used to decipher how this non-RFC style encoding works.

Vlix commented 4 years ago

Hmmm, I've been scouring some more, and found it's also called radix-64 encoding, unix/crypt encoding and some more things. I've found a page of a linux function that should just do what you expect? Maybe?

And this bcrypt source code also has the order of: "./ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789" ^ this also has the encodeBase64 function at the bottom (though it's in C)

EDIT: Oh man the Base64 wiki shows that the unix/crypt encoding is slightly different from the bcrypt encoding > .< It also shows there are tons of non-standard encodings... maybe for the library, it'd be best to only add the unix/crypt variant, as its (probably?) the most used. Or make a module that has different tables, so the user of the API can just rawBytes = decodeBase64Other UnixCrypt "someEncodedString" where UnixCrypt is one of the non-standard tables you can give (or a data constructor for an EncodingTable sum-type that will choose the table for you in the decodeBase64Other function? I dunno, I'm just brainstorming...

EDIT2: Ok, the main reason I got into this is trying to parse Python's passlib formats, where its PBKDF2 formats have the . and no padding... and I tried the following literally in RepLit

from passlib.utils.binary import ab64_encode
val = ab64_encode('>>>')
print val

And it gives "Pj4." sigh which is literally standard base64, but the + replaced with a ., so not even by the unix/crypt alphabet or anything... WHY!? \</rant>

emilypi commented 4 years ago

Thanks for hunting these down @Vlix. To address some of the issues raised here, I threw together an omnibus PR yesterday for all the things I want to get in before i do this here: https://github.com/emilypi/base64/pull/13

I'm probably going to punt on those TODO's in favor of getting yours in. I just need to sit at my screen on Monday and make sure it all makes sense to me first :)

Vlix commented 4 years ago

Yeah, it requires a bit of reading up, but I think it would make sense to have this in the library, even though some of it is somewhat obscure. No rush, I've found the encoding I was looking for isn't even really a different one, so I can just s/./+/ and add padding ='s and keep going 👍 Good luck with the library! And don't hesitate to ask, I'm fairly responsive.

emilypi commented 3 years ago

So just to follow up on this, I still do plan on doing this, but i don't want to start until we get Backpack supported in Stack. I'm just repeating the same module over and over, and it's not scalable.

Vlix commented 2 years ago

Just for future reference, I've found Passlib's charmaps which show: