This repository contains code to recreate the results from SALSA VERDE: a machine learning attack on Learning With Errors with sparse small secrets, which uses transformers to recover secrets from LWE samples ($\mathbf{a}$, $b$). The code in this repo can also be used to run the attack in SALSA PICANTE: a Machine Learning Attack on LWE with Binary Secrets. The Verde attack strictly supersedes the Picante attack in terms of performance.
Installation: To get started, clone the repository to a machine that has at least one gpu. Create the necessary conda environment via conda create --name lattice_env --file requirements.txt
and activate your shiny new environment via conda activate lattice_env
.
Download data: For ease of use, we have provided a pre-processed dataset for you to use. It will enable you to run experiments on $n=256$, $log_2 q=20$ data with sparse binary secrets. You can download the data from this link. The data folder contains the following files:
Your first experiment: Once you've done this, run python3 train.py --reload_data /path/to/data --secret_seed 3 --hamming 30 --input_int_base 105348 --share_token 64 --optimizer adam_warmup,lr=0.00001,warmup_updates=1000,warmup_init_lr=0.00000001
. This will train a model on the preprocessed dataset ($n=256$, $log_2q=20$, $h=30$). The input encoding base and share token for this setting are specified in Table 9 in VERDE's Appendix A.1, and the model architecture is specified in Section 2 of the paper. This model runs smoothly on a single NVIDIA Quadro GV100 32GB. It should take roughly ~2 hours per epoch to run, and, if a secret is recovered, this should happen in early epochs. You can re-run the experiment with a different secret seed (range is 0-9) or Hamming weight (range is 3-40) if this experiment fails -- remember that not all attacks succeed on the first try!
Parameters you can play with:
Although you can vary the parameters as you see fit, the default training parameters are specified as defaults in train.py
and the params.pkl
file provided with the dataset. Note that this codebase currently only supports the seq2seq model, not the encoder-only model tested in Section 7 of the paper.
src/train.py
):
enc_emb_dim
: encoder's embedding dimensiondec_emb_dim
: decoder's embedding dimensionn_enc_layers
: number of layers in encodern_dec_layers
: number of layers in decodern_enc_heads
: number of attention heads in encodern_dec_heads
: number of attention heads in decoderenc_loops
: number of loops through encoder (Universal Transformer parameter)dec_loops
: number of loops through decoder (Universal Transformer parameter)epoch_size
: number of LWE samples per epochbatch_size
: how many LWE samples per batchN
: lattice dimensionQ
: prime modulus for LWE problemsigma
: stdev of error distribution used in LWEsecret_type
: The distribution from which secret bits are chosen (code supports binary, ternary, Gaussian, and binomial -- note that full secret recovery is implemented for only binary and ternary secrets)hamming
: Number of nonzero bits in LWE secretinput_int_base
: integer encoding base for transformer inputsoutput_int_base
: integer encoding base for transformer outputsRunning sweeps with slurm: To run sweeps on our cluster, we use slurm to parse the json files and farm out experiments to machines. If you add additional elements to the lists in the json files (e.g. hamming: [30, 35]
instead of just hamming: [30]
) and use an appropriate parser (e.g. ), you too can run sweeps locally.
Analyzing results: If you have a large set of experiments you want to analyze, you can use ./notebooks/LatticeMLReader.ipynb
. This will parse log file(s) from a given experiment(s) and provides other helpful information.
Generating your own data: If you are interested in generating your own reduced data to run a different attack, proceed as follows.
python generate.py --timeout 432000 --N 256 --Q 842779 --lll_delta 0.99 --float_type dd --bkz_block_size 35 --threshold 0.435 --threshold2 0.5 –use_polish true --step RA_tiny2 --reload_data <path of orig_A.npy>
(you can change N, Q, etc., depending on the attack you want to run).python generate.py --min_hamming 16 --max_hamming 25 --num_workers 1 --num_secret_seeds 5 --step Ab --secret_type binary --epoch_size 1000000 --reload_size 1000000 --reload_data <path of directory for the reduced matrices>
. This will create the dataset with 50 binary secrets with Hamming weight ranging from 16 to 25, 5 at each Hamming weight. Now you have a set of reduced matrices on which you can run attacks! The command provided above for training models on the provided data should also work on this dataset, as long as you change the path to point at your own reduced data.
If you want to run the two preprocessing steps above using slurm, we have provided two .json files in the slurm_params
folder: create_n256_data_step1.json
and create_n256_data_step2.json
. These files provide helpful examples for setting up sbatch (or similar slurm scheduling tool) runs.
Please use the following citation for this repository.
@inproceedings{li2023salsa,
title={SALSA VERDE: a machine learning attack on Learning With Errors with sparse small secrets},
author={Li, Cathy and Wenger, Emily and Allen-Zhu, Zeyuan and Charton, Francois and Lauter, Kristin},
booktitle={Advances in Neural Information Processing Systems},
volume={37},
year={2023}
}
SALSA VERDE is licensed, as per the license found in the LICENSE file. See the CONTRIBUTING file for how to help out.