Attack on PyCryptodome's ElGamal Encryption with Proof-of-Concept (PoC)

weikengchen commented 6 years ago

Joint work with @TElgamal.

The textbook ElGamal implementation is not secure. PyCryptodome, and its relevant implementations, PyCrypto, and libgcrypt, use the wrong algorithm.

I would give the basic idea as follows. Readers with modern algebra background can jump to @TElgamal 's explanation here.

The wrong implementation has two messages classes.

Due to technical reasons, all messages are classified into two classes. A random message belongs to one of them (with 50% 50% possibility).

two_message_classes

In some applications, there are only several possible messages. Consider we are sending a military message. There are only two outcomes: the army moved forward or retracted. With a high possibility, they belong to different classes.

Encrypt the two messages.

We now encrypt two messages with ElGamal.

encrypt_message

The encrypted result is called a ciphertext. It should reveal NO information about the original message.

Expectation: encrypted message has indistinguishability.

We expect a secure encryption scheme to provide message indistinguishability. No adversary can learn what is encrypted inside the ciphertext better than a random guess.

message_ind

Let us assume the headquarter sent the second message -- no adversary should to able to learn.

Truth: the adversary can distinguish messages in two different classes.

Due to the wrong implementation, the adversary can distinguish messages in different classes.

truth

If the outcomes differ in which classes they belong to, then an adversary can infer more information.

Proof-of-Concept (PoC).

We release our attack code in this GitHub repo: https://github.com/weikengchen/attack-on-pycryptodome-elgamal

A running trace is collected by Travis CI: https://www.travis-ci.org/weikengchen/attack-on-pycryptodome-elgamal/builds/336415556

Showing that our adversary makes 0% mistakes in guessing the messages if the two outcomes differ in the classes they belong to.

Discussion

The problem can be fixed by using ElGamal carefully on the correct algebra group. Some results are given by @TElgamal and me on issue https://github.com/Legrandin/pycryptodome/issues/90 .

This bug is prevalent. It exists in legacy PyCrypto and libgcrypt (if used directly to encrypt messages).

Legrandin commented 6 years ago

Sorry, for following up so late.

The method you use (Crypto.PublicKey.ElGamalKey._encrypt()) is private (note the underscore) and undocumented as API exactly because it is unsafe for the reason you state (i.e. not being IND-CPA secure). It is responsibility of the caller to use the appropriate encoding, which is too much to ask - hence the "hiding" of the method. The method has not been removed because PGP uses ElGamal encryption in this (unsecure) form.

In the original PyCrypto project, the encrypt() method was instead public and documented, which I think is indeed a problem.

TElgamal commented 6 years ago

@Legrandin thanks for your reply! You are making a good point about hiding the encryption function.

Yet, many larger systems require the multiplicative homomorphism of Elgamal such as for shuffling ciphertexts, mixnets, proofs of shuffle, re-encryption, etc. Today, Python-based implementations most likely use the old insecure pycrypto library for that, simply because it is the standard package installed on many Linux distributions. If, at some point, there is a transition to PyCryptoDome, I conjecture developers will continue by using the "hidden" function.

@weikengchen mentioned that one idea might be to offer an alternative, CPA-secure version of Elgamal and mark the current version more clearly as being insecure.

Legrandin commented 6 years ago

Interoperability is important, so this CPA-secure version is ideally described in some open specification, standard or RFC along with test vectors. Without that, I would not think it is attractive to implement it.

However, I am not aware of any such spec (other than what being used by PGP, which is clearly not suitable). I also doubt enough interest can be raised, as cannot think of major benefits (e.g. speed, key size) compared to RSA or ECC.

weikengchen commented 6 years ago

The benefit has something to do with the homomorphic encryption.

ElGamal is the only partial homomorphic algorithm, such that Enc(a) Enc(b)=Enc(a b), and can be raised to CPA-secure level.

Homomorphic RSA is not secure at all. ECC does not support homomorphism. We also don't like RSA these days. The prime length of RSA needs to grow sharply. If the last generation of people implements ElGamal correctly, you may never hear about RSA today.

RSA is not a preferred option, entirely.

Homomorphic encryption enables people to compute on encrypted data. In recent years, they are used for Proxy Re-encryption, Mixnets for Anonymous Networks, Secure Data Aggregations. Many research works are building powerful data analytics, storage, and decentralized systems with homomorphic encryption. I know that the industry is still kind of far beyond that. But shouldn't the world implement ElGamal correctly first?

Later, people face a problem: NONE of the real-world implementations of ElGamal on the Earth is CPA-secure.

Hoping open specification or RFC is hard. RFC is maintained by a group of networking people whose cryptography experience has been proved unreliable by history for many times (WEP is a good example). You cannot hope they have better cryptography background than @TElgamal and me.

But if we do nothing, it is possible that people misuse such algorithms to build some problematic systems. Such systems will fail to make the world great!

So I would suggest: (1) implement ElGamal; (2) remove; (3) put a warning clearly:

This ElGamal implementation is intended for compatibility with PGP only. It is not IND-CPA secure and not semantically secure.

TElgamal commented 6 years ago

Note that this does not only break chosen-plaintext security, but even semantic security under a ciphertext-only attack. The idea is the same as in our source code: the attacker just computes the Legendre symbol of the ciphertext and correlates to public key and Elgamal's random coin.

weikengchen commented 6 years ago

@Legrandin Thanks for the quick fix on the first part: choosing a QR.

However, still not sufficient. Plaintexts need to be encoded also into a QR element. @TElgamal and I have a candidate solution (which we can explain it in more details).

But that modification will break the compatibility of ciphertexts. New decryption algorithms cannot automatically decode the old ones. What is your opinion?

Legrandin commented 6 years ago

As mentioned above, ElGamal encryption is not openly supported nor encouraged and users should not use private methods. I therefore think this should not be seen as a real attack.

weikengchen commented 6 years ago

@Legrandin prefer to update the documentation:

https://www.pycryptodome.org/en/latest/src/public_key/elgamal.html

because it said:

Warning Even though ElGamal algorithms are in theory reasonably secure, in practice there are no real good reasons to prefer them to RSA instead.

Legrandin / pycryptodome