capitalone / fpe

A format-preserving encryption implementation in Go
Apache License 2.0
202 stars 41 forks source link

Arbitrary alphabet and radix support #2

Open anitgandhi opened 7 years ago

anitgandhi commented 7 years ago

Description of Enhancement

Expanding on #1 , in theory, this package should support arbitrary alphabets as well. It will require either major changes to math/big or a new sub-package.

This may or may not have to include non-ASCII characters.

Further, there is a chance it will require a breaking change in the interface, if this alphabet input is added to NewCipher. Alternatively, a new function on Cipher called SetAlphabet could be added.


func NewCipher(radix int, maxTLen int, key []byte, tweak []byte) (Cipher, error) {
    ...
    // defaults
    f.alphabet = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
    f.radix = 62
}

func (f Cipher) SetAlphabet(alphabet string, radix int) {
    // error checking on alphabet and radix
    // ex: alphabet can't be "12345" and radix be 9
    ...

    f.alphabet = alphabet
    f.radix = radix
}
fmpwizard commented 7 years ago

Two questions:

  1. Did you already submit/propose the changes to the Go project? (have a link?)
  2. I have been looking for something like this for a while now, but the string I have to encrypt, besides having 0-9, a-z, A-Z, also could have things like / % & and a few other characters, would they fit into the idea behind this ticket of alternative alphabets ?

Thanks a lot for making this project public!

anitgandhi commented 7 years ago
  1. I haven't yet, thanks for the reminder 👍
  2. Yup the arbitrary alphabet support would be perfect for things like / % &. I haven't gotten a chance to take a deeper dive into how it would work in detail; in one sense it's easiest to convince Golang maintainers to add this support, but on the flip side that would require the newest version of Golang, rather than being self-contained.
fmpwizard commented 7 years ago

It would be great if the change wouldn't require using the latest Go version (specially using tip), but have that as a planned migration. What I mean is, add it as a subpackage, so this can be used soon, and from the start making it clear that once Go updates math/big , then the subpackage won't be needed any more

anitgandhi commented 7 years ago

Agreed - I'd prefer it to not be dependent on the Go version.

I like the migration option long term, but as it is that's dependent on upstream Go.

I tested the base62 support ( #1 ) in a branch by copying the standard math/big into fpe/big and making the changes there, and it worked fine. However, I'm a little uncomfortable importing an entire existing library as a sub-package like that and then modifying; potential license conflicts and other issues.

Further, math/big has a lot of other functions that aren't relevant to this package.

I think for now what I will do is use only the parts of math/big and potentially re-write as well. This may provide the opportunity for further performance optimizations as well :)

anitgandhi commented 7 years ago

@fmpwizard discussion is here: https://github.com/golang/go/issues/21558

fmpwizard commented 7 years ago

I see that the linked ticket was closed, but they didn't go with the change that would allow arbitrary alphabets, right?

anitgandhi commented 7 years ago

@fmpwizard that is correct, they added base62 support which basically fixes #1 , which is nice; but not full arbitrary alphabets. The Go team's comments on the linked issue explains the reasoning behind that.

I do still want to add arbitrary alphabet support, especially seeing your use case above. I may just have to add separate func (z *big.Int) TextWithCharset(charset string) and func (z *Int) SetStringWithCharset(s string, charset string) that only deal with fpe related use-cases. As the Go team mentioned, it would introduce complexities with encoding/normalization that is beyond the scope of the current need. It may end up being as simple as copying the old internal digits function from <Go1.5 and modifying it appropriately; it may not be.

If you have any thoughts on how to do this effectively, please let me know!

fmpwizard commented 7 years ago

@anitgandhi thanks for following up on this topic. I can more or less follow the ideas around implementing this, but not enough to propose an actual changeset, let alone doing it effectively.

fmpwizard commented 6 years ago

friendly ping

anitgandhi commented 6 years ago

Sadly I haven't had the bandwidth to work on this issue 😢

As a note to my future self (or anyone who wants to take this up), implementing this will have to limit to ASCII strings so the []byte <--> string conversions will hold up. UTF-8 and other aspects would complicate it.

fmpwizard commented 6 years ago

I was looking at implementing this but had a question:

[...] digits function from <Go1.5

I pulled the go repo, checked out go1.4.3 and I didn't find any digits func, I'm guessing that wasn't the name I should look for, right?

anitgandhi commented 6 years ago

Whoops, it was called string: https://github.com/golang/go/blob/release-branch.go1.4/src/math/big/nat.go#L737

That code already has a check for effectively ensuring ASCII/base256 which should get the job done.

fmpwizard commented 6 years ago

Thanks! I'll take a look and post back with results

UzairM commented 6 years ago

any update on this

fmpwizard commented 6 years ago

I think I finally figure out what I'm supposed to do here, should have some time to work on this over the weekend

fmpwizard commented 6 years ago

I went the path of copying the string unexported function from the math/big package, but that function depended on a chain of functions that pretty much meant copying/pasting a large portion of golang's math/big (including some assembly code) . @anitgandhi you went this path before:

I tested the base62 support ( #1 ) in a branch by copying the standard math/big into fpe/big and making the changes there, and it worked fine. However, I'm a little uncomfortable importing an entire existing library as a sub-package like that and then modifying; potential license conflicts and other issues.

Is the license issue something that capitalone's lawyers may bring up or are you worried about the golang side? if the golang side, I can ask on the mailing list, you'll know best from the capitalone side.

From my reading of the math/big code, it looks pretty well optimized already or at least I don't think I could come up with something better than what's there already (where I could say, this is my original code)

anitgandhi commented 6 years ago

In hindsight, the licensing aspect is probably more trivial as compared to the issue of maintaining the separate copy from the upstream. Personally I don't like it when things get out of sync like that.

What about a character substitution model? I suppose it would still be limited to base62 but a utility that takes an input and output alphabet, and just does a straight character substitution could be a first step.

fmpwizard commented 6 years ago

the out of sync is a real issue, I was worried about how to make sure we would port any bug fixes, etc, it's possible, but at times may not be the best way to use time.

The character substitution sounds interesting.

With the base62 model, we can now support

0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ

in my particular use case, I can limit it to either all upper or lowercase letter, plus 0-9 and then ideally these:

,./;'[]\=-<>?:"{}|+_)(*&^%$#@!`~)

if I urlencode the string I get from my source, mixed with only really using 0-9 and A-B, base62 would be enough I think

Is this what you had in mind?

anitgandhi commented 6 years ago

Also iirc in the 1.9 release, math/big started using some math/bits which I think uses internal/cpu . Might be wrong on that tho

And yes thats exactly what I was thinking. For every n characters you don't use out of base62, you get a different n "special" chars.

Which I suppose doesn't even have to be in this package since it's just a character map

martinwaite commented 5 years ago

Another take on this is not get involved in the alphabet at all.

Instead, define fpe encrypt as a function that takes an array of uint16 - so supporting radix 1..65536.

Instead of using Int.SetString() to build the binary string, use the NUMradix(X) function defined in https://nvlpubs.nist.gov/nistpubs/specialpublications/nist.sp.800-38g.pdf

Something like:

func num(s []uint16, radix uint64) (*big.Int, *big.Int, error) {
        var big_radix, max, bv, x big.Int
        if radix > 65536 {
                return nil, nil, fmt.Errorf("Radix (%d) too big: max supported radix is 65536", radix)
        }

        maxv := uint16(radix - 1)
        big_radix.SetUint64(uint64(radix))
        max.SetInt64(1)

        for i, v := range s {
                if v > maxv {
                        return nil, nil, fmt.Errorf("Value at %d out of range: got %d - expected 0..%d", i, v, maxv)
                }
                bv.SetUint64(uint64(v))
                x.Mul(&x, &big_radix)
                x.Add(&x, &bv)
                max.Mul(&max, &big_radix)
        }
        return &x, &max, nil
}

This example also calculates max which is radix to power of len(s), for use later in encrypt to compute the modulus.

A corresponding function is required to convert the encryption result back to an array of uint16. The NIST paper defines a STRradix function to do that.

With these encryption interfaces in place, it is trivial to convert strings in fairly large alphabets (65536 members) into arrays of uint16 and back again.

I'll prepare a PR when I get time for you look at.

anitgandhi commented 5 years ago

@martinwaite interesting, that looks like a good way to solve the problem. I'd be happy to review and merge a PR whenever. Since your solution still uses the math/big package we still get the advantages of any optimizations made in that standard library, so that's excellent.

Really my only conditions for the PR is there isn't a significant performance regression, but we can tackle that as issues come up :+1: