Website-Fingerprinting-Library (WFlib)

WFlib is a Pytorch-based open-source library for website fingerprinting attacks, intended for research purposes only.

Website fingerprinting is a type of network attack in which an adversary attempts to deduce which website a user is visiting based on encrypted traffic patterns, even without directly seeing the content of the traffic.

We provide a neat code base to evaluate 11 advanced DL-based WF attacks on multiple datasets. This library is derived from our ACM CCS 2024 paper. If you find this repo useful, please cite our paper.

@inproceedings{deng2024wflib,
  title={Robust and Reliable Early-Stage Website Fingerprinting Attacks via Spatial-Temporal Distribution Analysis},
  author={Deng, Xinhao and Li, Qi and Xu, Ke},
  booktitle={Proceedings of the 2024 ACM SIGSAC Conference on Computer and Communications Security},
  year={2024}
}

Contributions via pull requests are welcome and appreciated.

WFlib Overview

The code library includes 11 DL-based website fingerprinting attacks.

Attacks	Conference	Paper	Code
AWF	NDSS 2018	Automated Website Fingerprinting through Deep Learning	DLWF
DF	CCS 2018	Deep Fingerprinting: Undermining Website Fingerprinting Defenses with Deep Learning	df
Tik-Tok	PETS 2019	Tik-Tok: The Utility of Packet Timing in Website Fingerprinting Attacks	Tik_Tok
Var-CNN	PETS 2019	Var-CNN: A Data-Efficient Website Fingerprinting Attack Based on Deep Learning	Var-CNN
TF	CCS 2019	Triplet Fingerprinting: More Practical and Portable Website Fingerprinting with N-shot Learning	tf
BAPM	ACSAC 2021	BAPM: Block Attention Profiling Model for Multi-tab Website Fingerprinting Attacks on Tor	None
ARES	S&P 2023	Robust Multi-tab Website Fingerprinting Attacks in the Wild	Multitab-WF-Datasets
RF	Security 2023	Subverting Website Fingerprinting Defenses with Robust Traffic Representation	RF
NetCLR	CCS 2023	Realistic Website Fingerprinting By Augmenting Network Trace	Realistic-Website-Fingerprinting-By-Augmenting-Network-Traces
TMWF	CCS 2023	Transformer-based Model for Multi-tab Website Fingerprinting Attack	TMWF
Holmes	CCS 2024	Robust and Reliable Early-Stage Website Fingerprinting Attacks via Spatial-Temporal Distribution Analysis	WFlib

We implemented all attacks using the same framework (Pytorch) and a consistent coding style, enabling researchers to evaluate and compare existing attacks easily.

Usage

Install

git clone git@github.com:Xinhao-Deng/Website-Fingerprinting-Library.git
pip install --user .

Note

Python 3.8 is required.

Datasets

mkdir datasets

Download datasets (link) and place it in the folder ./datasets

Datasets	# of monitored websites	# of instances	Intro
CW.npz	95	105730	Closed-world dataset. Details
OW.npz	95	146446	Open-world dataset. Details
WTF-PAD.npz	95	105730	Dataset with WTF-PAD defense. Details
Front.npz	95	95000	Dataset with Front defense. Details
Walkie-Talkie.npz	100	90000	Dataset with Walkie-Talkie defense. Details
TrafficSliver.npz	95	95000	Dataset with TrafficSliver defense. Details
NCDrift_sup.npz	93	21430	Network condition drift dataset, including superior traces. Details
NCDrift_inf.npz	93	6882	Network condition drift dataset, including inferior traces. Details
Closed_2tab.npz	100	58000	2-tab dataset in the closed-world scenario. Details
Closed_3tab.npz	100	58000	3-tab dataset in the closed-world scenario. Details
Closed_4tab.npz	100	58000	4-tab dataset in the closed-world scenario. Details
Closed_5tab.npz	100	58000	5-tab dataset in the closed-world scenario. Details
Open_2tab.npz	100	64000	2-tab dataset in the open-world scenario. Details
Open_3tab.npz	100	64000	3-tab dataset in the open-world scenario. Details
Open_4tab.npz	100	64000	4-tab dataset in the open-world scenario. Details
Open_5tab.npz	100	64000	5-tab dataset in the open-world scenario. Details

The extracted dataset is in npz format and contains two values: X and y. X represents the cell sequence, with values being the direction (e.g., 1 or -1) multiplied by the timestamp. y corresponds to the labels. Note that the input of some datasets consists only of direction sequences.
Divide the dataset into training, validation, and test sets.

# For single-tab datasets
python exp/dataset_process/dataset_split.py --dataset CW
# For multi-tab datasets
python exp/dataset_process/dataset_split.py --dataset Closed_2tab --use_stratify False

Training \& Evaluation

We provide all experiment scripts for WF attacks in the folder ./scripts/. For example, you can reproduce the DF attack on the CW dataset by executing the following command.

bash scripts/DF.sh

The ./scripts/DF.sh file contains the commands for model training and evaluation.

dataset=CW

python -u exp/train.py \
  --dataset ${dataset} \
  --model DF \
  --device cuda:1 \
  --feature DIR \
  --seq_len 5000 \
  --train_epochs 30 \
  --batch_size 128 \
  --learning_rate 2e-3 \
  --optimizer Adamax \
  --eval_metrics Accuracy Precision Recall F1-score \
  --save_metric F1-score \
  --save_name max_f1

python -u exp/test.py \
  --dataset ${dataset} \
  --model DF \
  --device cuda:1 \
  --feature DIR \
  --seq_len 5000 \
  --batch_size 256 \
  --eval_metrics Accuracy Precision Recall F1-score \
  --load_name max_f1

The meanings of all parameters can be found in the exp/train.py and exp/test.py files. WFlib supports modifying parameters to easily implement different attacks. Moreover, you can use WFlib to implement combinations of different attacks or perform ablation analysis.

Contact

If you have any questions or suggestions, feel free to contact:

Xinhao Deng (dengxh23@mails.tsinghua.edu.cn)

Acknowledgements

We would like to thank all the authors of the referenced papers. Special thanks to Yixiang Zhang for his support.

Xinhao-Deng / Website-Fingerprinting-Library

readme