Closed nikkolasg closed 7 years ago
I took the liberty of defining the new methods as follow:
type Point interface {
...
Pick(rand cipher.Stream) kyber.Point
Embed(data []byte, rand cipher.Stream) kyber.Point
EmbedLen() int
}
removing the remainder
of the data that has not been embedded. The reason is that this data is always of a constant size regarding a data buffer, i.e. it's always data[max(len(data,point.EmbedLen()):]
so there's no point to return it. However, I made it clear in the comments that Embed
only takes EmbedLen
bytes of the given data.
My original intent for the Embed API was that some methods of embedding might actually be able to consume a variable amount of data, e.g., because due to the randomness in the embedding process it might not always "work" to embed the largest possible amount of data, and we would then have to fall back to embedding a slightly shorter amount of data. However, the actual embedding mechanism I implemented ended up tending toward fixed-length embedded data with parameters tuned "conservatively enough" that the probability of failure (i.e., the probability of not finding any random part that makes the embedding succeed) should be "negligible". I expect that handling embedding this way should always be feasible, and it certainly simplifies the API, so I'm fine with defining the embedded data length to be always fixed and not just a minimum (as EmbedLen was originally intended to convey).
Please be sure to update the godoc comment documents appropriately to make these semantics clear though. :)
@bford
Hi Prof. Bryan Ford,
I notice the discussion here on Embed
method that leaves 8-bit for randomly searching for points on the curve. It would be useful to document/provide citations why the 8-bit is "conservatively enough".
This is crucial for some applications, and their developers may want to double check this implementation decision. Like, if one day we found one "message" that cannot be embedded on a specific curve (causing an infinite loop), it might be a DDoS attack on some systems built over Kyber, for example, Atom (SOSP'17)'s encryption algorithm.
A paper "Injective Encodings to Elliptic Curves" by Pierre-Alain Fouque, Antoine Joux, and Mehdi Tibouchi (https://eprint.iacr.org/2013/373.pdf) gave the discussion below:
Randomly searching provably works in expected polynomial time if we embed HALF of one coordinate, leaving another half of this coordinate for randomized searching. The paper said 'There is a probabilistic, “folklore” method for doing so, but it only provably works for messages of length less than half the size of the curve.', and the paper provided some details and gave a proof in Section 2.4.
No proof that embedding more than HALF will always be successful. The paper speculated "It is conceivable that the same algorithm does, in fact, work with a larger $\ell$ still", where here $\ell$ means the length of the message being embedded (larger than HALF).
So it seems the current implementation is (1) a little bit "folklore" (I will not fight against that) and (2) not constant-time. The second point could be an issue -- sometimes we want it to be constant-time.
A deterministic "embedding" may be from the elligator. There is already academic work that did so. A paper "Sieve: Cryptographically Enforced Access Control for User Data in Untrusted Clouds" (NSDI'16) uses a crypto library libdecaf
's elligator to convert a plaintext message to a point on Ed448-Goldilocks curve.
It is based on the elligator (in particular, type 2) to a curve Ed448-Goldilocks, which eliminates cofactors using the decaf technique. However, decaf elligator seems to have more than one preimage -- I guess someone needs to devise a method to ensure we can recover the message, e.g., reserving some space and leaving a hash.
I mentioned cofactor elimination because it seems that Embed
wants to ensure that the point reveals nothing about the underlying plaintext. The current implementation seems to loop until a point in the desired subgroup is discovered.
An engineer's thought, coming from the java-implementation of Pick
: what if we simply count from 0 to 0xff, and if we arrive at the end, we increase the search by 8 bits, and so on. As the Pick
doesn't make any guarantees how many bytes it embeds, this would allow to do an adapted search for the longest slice of bytes that could be embedded.
Hi Linus, thanks for the information about Pick
.
Seems to be a solution, but might be unable to satisfy all upper-level applications. Like in Atom (SOSP'17) which implements an ElGamal encryption over Curve25519 (all points will be in the proper subgroup), is using Embed
for storage -- saving a fixed-length string and wanting to maximize the use of the space.
In that scenario, we really need to Embed
that can guarantee how many bytes will be able to be embedded.
And in Atom (SOSP'17), the eventual number of ciphertexts for any fixed-length plaintext will have to be equal for this purpose -- a shuffling network of all these ciphertexts for anonymous messaging. So, if we continue to Pick
the remainder until all data gets in, the number of ciphertexts is too varied.
See discussion https://github.com/dedis/kyber/issues/10#issuecomment-304882186 Basically, we want to have on both
Scalar
andPoint
:Pick
method used to create a random elementEmbed
method used to embed any given slice of bytes of a maximum size.