TalwalkarLab / leaf

Leaf: A Benchmark for Federated Settings
BSD 2-Clause "Simplified" License
852 stars 244 forks source link

LEAF: A Benchmark for Federated Settings

Resources

Datasets

  1. FEMNIST

    • Overview: Image Dataset
    • Details: 62 different classes (10 digits, 26 lowercase, 26 uppercase), images are 28 by 28 pixels (with option to make them all 128 by 128 pixels), 3500 users
    • Task: Image Classification
  2. Sentiment140

    • Overview: Text Dataset of Tweets
    • Details 660120 users
    • Task: Sentiment Analysis
  3. Shakespeare

    • Overview: Text Dataset of Shakespeare Dialogues
    • Details: 1129 users (reduced to 660 with our choice of sequence length. See bug.)
    • Task: Next-Character Prediction
  4. Celeba

  5. Synthetic Dataset

    • Overview: We propose a process to generate synthetic, challenging federated datasets. The high-level goal is to create devices whose true models are device-dependant. To see a description of the whole generative process, please refer to the paper
    • Details: The user can customize the number of devices, the number of classes and the number of dimensions, among others
    • Task: Classification
  6. Reddit

    • Overview: We preprocess the Reddit data released by pushshift.io corresponding to December 2017.
    • Details: 1,660,820 users with a total of 56,587,343 comments.
    • Task: Next-word Prediction.

Notes