We implemented in Python a Private Set Intersection (PSI) protocol, a functionality that allows two parties to privately join their sets in order to compute their common elements. In our setup, these parties are:
Our implementation is based on this paper and its follow-up. The protocol uses homomorphic encryption, a cryptographic primitive that allows computations on encrypted data in such a way that only the secret key holder has access to the decryption of the result of these computations. For implementing PSI, we used the [FV] homomorphic encryption scheme from the TenSEAL library. You can also check out a concurrent SEAL-based C++ implementation of the protocol that has been recently published by Microsoft.
Disclaimer: Our implementation is not meant for production use. Use it at your own risk.
Suppose the client wants to check if his query x belongs to the database of the server. Consider this database having as entries the integers y_1,...,y_n. Then the server can associate to its database the following polynomial P(X) = (X-y_1) ... (X-y_n). If x belongs to the database, then P vanishes at x. This is the main idea of the protocol: the server should compute this evaluation of P at x in a secure way and send it to the client.
More precisely, the client sends his query encrypted with a homomorphic encryption scheme, Enc(x). Then the server evaluates P at the given encryption: due to homomorphic properties, the result will turn out to be Enc(P(x)). Then the client decrypts this result and checks if it is equal to 0; in case of equality, the query belongs to the database.
The protocol is split in two parts: offline phase and online phase. The online phase starts when the client performs the OPRF protocol with the server, for encoding its items with the server's secret key.
In this phase, both the server and the client preprocess their datasets, until performing the PSI protocol.
oprf_server_key
and considers sigma_max
bits out of the first coordinate of these points.oprf_client_key
and sends them to server.oprf_server_key
and sends them back to client. Now the client's points are oprf_server_key
oprf_client_key
item.oprf_client_key
and takes sigma_max
bits out of the first coordinate of these points.sigma_max
-bit integers.
number_of_bins
bins:
bin_capacity
elements.
Hence, the PSI protocol can be therefore performed per each bin. (a padding step might be required to get the bins full.)
alpha
minibins, having bin_capacity
/alpha
elements.
Hence, performing the PSI protocol on each bin is split as performing alpha
PSI protocols for each minibin.
bin_capacity
/alpha
+ 1 coefficients.
In this phase, both the server and the client perform the actual PSI protocol. The encryption scheme used, [FV], allows encrypting messages as polynomials of degree less than poly_modulus_degree
, which is a power of 2, with integer coefficients modulo plain_modulus
. This modulus is chosen so that it is a prime congruent with 1 modulo 2 poly_modulus_degree
, which helps identifying each such polynomial with a vector of poly_modulus_degree
integer entries modulo plain_modulus
. TenSEAL allows encryption of vectors of integers, by first performing the above correspondence and then performing the actual encryption. Also, in a similar way, decrypting in TenSEAL works for vectors of integers. The encryption scheme implemented in TenSEAL benefits from allowing to encrypt vectors of integers, by performing both encoding and encryption. Also, decrypting in TenSEAL leads to vectors of integers*, the encodings of the corresponding decryptions.
number_of_bins
/poly_modulus_degree
vectors.Since each minibin is represented by a polynomial of degree D =bin_capacity
/alpha
, evaluating such a polynomial can be performed by doing the scalar product between the vector of its coefficients and all the (encrypted) powers of the x, with exponent at most D. The next step, windowing, helps the client send sufficiently many powers so that the server can recover all the powers with small computational effort.
ell = 1
.
alpha
* number_of_bins
/ poly_modulus_degree
encrypted results and sends them to the client.
Check requirements.txt
before running the files. You can generate the datasets of the client and the server by running set_gen.py
. Then run server_offline.py
and client_offline.py
to preprocess them. Now go the online phase of the protocol by running server_online.py
and client_online.py
. Have fun! :smile: