The labels for data availability were inspired by the work of Harrigian et al. (2021), and are explained below:
FREE - The dataset is publicly available and hosted online for anyone to access.
AUTH - The data can be accessed by contacting the paper's authors.
API - The dataset can be reproduced from the details provided in the article using dedicated APIs for different social media platforms with a reasonable degree of effort.
DUA - The data is available only after a data usage agreement is signed. Sometimes, authorization from an Institutional Review Board (IRB) may be needed.
UNK - The dataset availability is unknown; the authors do not mention if the data is available to the research community.
N/AV - The dataset is no longer available or cannot be shared due to ethical considerations.
For the datasets that are publicly available for download or can be accessed through user agreements, we provide the links to the data.
Denotes that the dataset contains data collected during the COVID-19 pandemic.
Dataset | Platform | Language | Level | Annotation Procedure | Label | Dataset Size | Availability | Link |
---|---|---|---|---|---|---|---|---|
Multitask (Benton et al., 2017) | English | USER | Self-disclosure | Labels for multiple disorders | 9.5K users | UNK | ||
RSDD (Yates et al., 2017) | English | USER | Self-disclosure | Binary | 116K users | N/AV | ||
Aldarwish and Ahmad (2017) | Twitter, Facebook, LiveJournal | English | POST | Manual annotation | Binary, DSM-IV symptoms | 6.7K posts | API | |
Reece and Danforth (2017) | English | USER | CES-D | Binary | 166 users | UNK | ||
Shen et al. (2017) | English | USER | Self-disclosure | Binary | 2.8K users | FREE | https://github.com/sunlightsgy/MDDL | |
160Users (Jamil et al., 2017) | English | USER, POST | Self-disclosure | Binary | 160 users, 8K posts | AUTH | ||
SAD corpus (Mowery et al., 2017) | English | POST | Manual annotation | Symptoms, psychological stressors | 9.3k posts | API | ||
Vedula and Parthasarathy (2017) | English | USER | Depression-related keywords | Binary | 150 users | API | ||
Hiraga (2017) | Japanese blogging websites | Japanese | USER | Self-disclosure | Binary | 101 users | UNK | |
eRisk2017 (Losada et al., 2017) | English | USER | Self-disclosure | Binary | 887 users | DUA | https://erisk.irlab.org/2017/index.html | |
Yazdavar et al. (2017) | English | USER | Self-disclosure | Binary | 47K users | UNK | ||
[Yazdavar et al. (2017) ]() | English | USER | Self-disclosure | Binary | 47K users | UNK | ||
Rojas-Barahona et al. (2018) | Koko Platform | English | POST | Manual annotation | CBT Concepts | 4035 posts | AUTH | https://github.com/YinpeiDai/NAUM |
Pirina and Çöltekin (2018) | English | POST | Subreddit participation | Binary | 3.6K posts | FREE | https://github.com/Inusette/Identifying-depression/tree/master/Data_Collector | |
Eichstaedt et al. (2018) | English | USER | Medical records diagnosis | Binary | 683 users | UNK | ||
Seabrook et al. (2018) | Twitter, Facebook | English | USER | PHQ-9 | Depression severity | 78 users | N/AV | |
Ricard et al. (2018) | English | USER | PHQ-8 | Binary | 749 users | UNK | ||
Shen et al. (2018) | Sina Weibo | Chinese | USER | Self-disclosure | Binary | 1.1K users | UNK | |
TRT (Wolohan et al., 2018) | English | USER | Self-disclosure | Binary | 12K users | UNK | ||
eRisk2018 (Losada et al., 2018) | English | USER | Self-disclosure | Binary | 1.1K users | DUA | https://erisk.irlab.org/2018/index.html | |
Loveys et al. (2018) | 7 Cups of Tea | English | USER | Self-disclosure | Binary | 1.9K users | UNK | |
Chen et al. (2018a) | English | USER | Self-disclosure | Labels for multiple disorders | 7.9K users | API | ||
Chen et al. (2018b) | English | USER | Self-disclosure | Binary | 7K users | API | ||
RSDD-Time (MacAvaney et al., 2018) | English | USER | Self-disclosure | Labels for multiple disorders | 598 users | N/AV | ||
Islam et al. (2018) | English | POST | - | Binary | 7K posts | FREE | https://github.com/ranju12345/Depression-Anxiety-Facebook-page-Comments-Text | |
SMHD (Cohan et al., 2018) | English | USER | Self-disclosure | Labels for multiple disorders | 350K users | N/AV | ||
Wu et al. (2018) | Chinese | USER | CES-D | Binary | 1.4K users | UNK | ||
Hemtanon and Kittiphattanabawon (2019) | Thai | POST | Manual annotation | Binary | 1.5K posts | UNK | ||
Wang et al. (2019) | Sina Weibo | Chinese | POST | Manual annotation | Depression severity | 13.9K users | UNK | |
Gui et al. (2019) | English | USER | Self-disclosure | Binary | 2.8K users | API | ||
Chandra Guntuku et al. (2019) | English | USER | BDI | Binary | 887 users | UNK | ||
Almouzini et al. (2019) | English | USER, POST | Manual annotation | Binary | 89 users | UNK | ||
Leis et al. (2019) | Spanish | USER, POST | Self-disclosure, manual annotation | Binary | 540 users, 1K posts | FREE | https://www.kaggle.com/datasets/francescoronzano/spanish-tweets-suggesting-depression | |
Coello-Guilarte et al. (2019) | Spanish | USER | Self-disclosure | Binary | 316 users | FREE | https://ccc.inaoep.mx/~mmontesg/resources/CrossLingualDepression.zip | |
Peng et al. (2019) | Sina Weibo | Chinese | USER | Manual annotation | Binary | 387 users | UNK | |
eRisk2019 (Losada et al., 2019) | English | USER | BDI-II | BDI filled-in | 20 users | DUA | https://erisk.irlab.org/2019/index.html | |
Uddin et al. (2019) | Bengali | POST | Manual annotation | Binary | 3.8K posts | UNK | ||
Yao et al. (2020) | Sina Weibo | Chinese | USER | Manual, automatic annotation | Binary | 2.7K users | UNK | |
Owen et al. (2020) | English | POST | Manual annotation | Binary | 1K posts | FREE | https://bitbucket.org/nlpcardiff/preemptive-depression-anxiety-twitter/src/master/ | |
Bathina et al. (2021) | English | USER | Self-disclosure | Binary | 1.2K users | AUTH | https://github.com/mctenthij/CDS_paper | |
RĂssola et al. (2020) | English | POST | Self-disclosure, heuristics | Binary | 14K posts | DUA | ||
Birnbaum et al. (2020) | English | USER | Medical records diagnosis | Binary | 223 users | AUTH | ||
Mann et al. (2020) | Portuguese | USER | BDI | Binary | 221 users | UNK | ||
Santos et al. (2020) | Portuguese | USER | Self-disclosure | Binary | 224 users | UNK | ||
Alghamdi et al. (2020) | Online forums | Arabic | POST | Manual annotation | Binary | 20K posts | UNK | |
Li et al. (2020) | Sina Weibo | Chinese | USER | Self-disclosure | Binary | 1.8K users | FREE | https://github.com/omfoggynight/Chinese-Depression-domain-Lexicon |
D2S (Yadav et al., 2020) | English | POST | PHQ-9 | PHQ-9 symptoms | 12K posts | AUTH | ||
Wang et al. (2020) | Sina Weibo | Chinese | USER | Depression-related keywords | Binary | 32K users | FREE | https://github.com/aidenwang9867/Weibo-User-Depression-Detection-Dataset |
eRisk2020 (Losada et al., 2020) | English | USER | BDI-II | BDI filled-in | 90 users | DUA | https://erisk.irlab.org/2020/index.html | |
Stankevich et al. (2020) | VKontakte | Russian | USER | BDI | BDI score | 1.3K users | UNK | |
Tabak and Purver (2020) | English, French, German, Italian, Spanish | USER | Self-disclosure | Binary | 5K users | API | ||
Yazdavar et al. (2020) | English | USER | Manual annotation | Binary | 8.7K users | DUA | ||
Wołk et al. (2021) | Facebook, Reddit | Polish | POST | Self-disclosure, clinical interview | Binary | 262 users | UNK | |
Haque et al. (2021) | English | POST | Subreddit participation | Depression vs. suicide | 1.8K posts | FREE | https://github.com/ayaanzhaque/SDCNL | |
Chiu et al. (2021) | English, Chinese | USER | Depression-related keywords | Binary | 520 users | UNK | ||
Nanomi Arachchige et al. (2021) | Online forums | English | POST | Manual annotation | Depression severity | 2.1K posts | UNK | |
Hämäläinen et al. (2021) | Online blogs | Thai | POST | Manual annotation | Binary | 900 posts | FREE | https://zenodo.org/record/4734552 |
Sherman et al. (2021) | English | USER | Self-disclosure | Binary | 31K users | DUA | ||
Yang et al. (2021) | Sina Weibo | Chinese | POST | Manual annotation | Depression severity | 6.1K posts | AUTH | |
eRisk 2021 (Parapar et al., 2021) | English | USER | BDI-II | BDI filled-in | 170 users | DUA | https://erisk.irlab.org/2021/index.html | |
Pirayesh et al. (2021) | English | USER | Self-disclosure | Binary | 817 users | AUTH | ||
Niimi (2021) | TOBYO | Japanese | USER | Blog theme | Binary | 901 users | UNK | |
Musleh et al. (2021) | Arabic | USER, POST | CES-D and self-disclosure | Binary, DSM-5 symptoms | 4.5K posts | UNK | ||
Guo et al. (2021) | English | USER | Self-disclosure | Labels for multiple disorders | 7.9 K users | API | ||
Zhang et al. (2021) | English | USER | Self-disclosure | Binary | 5K users | API | ||
Cohrdes et al. (2021) | German | POST | Automatic annotation for PHQ-8 symptoms | Binary | 88K posts | AUTH | ||
Zhou et al. (2021) | English | USER | Self-disclosure | Binary | 1.8M posts | API | ||
Safa et al. (2022) | English | USER | Self-disclosure | Binary | 1.1 K users | AUTH | ||
Maghraby and Ali (2022) | Arabic | POST | PHQ-9 | PHQ-9 symptoms | 1.2K posts | FREE | https://data.mendeley.com/datasets/myrb2gky8w/1 | |
Naseem et al. (2022) | English | POST | Manual annotation | Depression severity | 3.5 K posts | FREE | https://github.com/usmaann/Depression_Severity_Dataset | |
PsySym (Zhang et al., 2022) | English | USER, POST | Automatic and manual annotation | DSM-5 symptoms for multiple disorders | 26K users, 8.5K posts | AUTH | https://github.com/blmoistawinde/EMNLP22-PsySym | |
MHB (Boinepelli et al., 2022) | Mental health forums | English | USER | Forum participation | Only depression | 9.3K users | FREE | https://www.dropbox.com/sh/66nousl8j0j5ull/AACwRnzJjszl3Eys8ZjQnMVya?dl=0 |
CAMS (Garg et al., 2022) | English | POST | Manual annotation | Causes for depression | 3.1 K posts | FREE | https://github.com/drmuskangarg/CAMS | |
Sotudeh et al. (2022) | English | POST | Subreddit participation | Summarization | 24 k posts | DUA | https://ir.cs.georgetown.edu/resources/mentsum.html | |
Kayalvizhi and Thenmozhi (2022) | English | POST | Manual annotation | Depression severity | 16K posts | FREE | https://github.com/Kayal-Sampath/detecting-signs-of-depression-from-social-media-postings/tree/main | |
eRisk2022 (Crestani et al., 2022) | English | USER | Self-disclosure | Binary | 3.1K users | DUA | https://erisk.irlab.org/2022/index.html | |
Monreale et al. (2022) | English | POST | Subreddit participation | Labels for multiple disorders | 16 K posts | API | ||
Kabir et al. (2022) | Bengali | POST | Manual annotation | Depression severity | 5K posts | FREE | https://github.com/omanwhatiscomputer/depression-severity/ | |
PRIMATE (Gupta et al., 2022) | English | POST | Manual annotation | PHQ-9 symptoms | 2K posts | DUA | https://github.com/primate-mh/Primate2022 | |
PsycheNet-G (Mihov et al., 2022) | English | USER | Self-disclosure | Binary | 591 users | UNK | ||
Twitter-STMHD (Singh et al., 2022) | English | USER | Self-disclosure, manual annotation | Labels for multiple disorders | 33K users | FREE | https://zenodo.org/record/5854911 | |
multiRedditDep (Uban et al., 2022) | English | USER | Self-disclosure | Binary | 3.7K users | AUTH | ||
Davis et al. (2022) | English | USER | Subreddit participation | Binary | 81K users | API | ||
Fernández-Barrera et al. (2022) | Flickr | English | POST | Depression tags | Only depression | 14.5K posts | UNK | |
Cha et al. (2022) | Twitter, Everytime | Korean, English, Japanese | POST | Lexicon-based automatic annotation | Binary | 26M posts, 22K posts | AUTH | |
DEPTWEET (Kabir et al., 2023) | English | POST | Manual annotation | Depression severity | 40K posts | FREE | https://github.com/mohsinulkabir14/DEPTWEET | |
SetembroBR (Ramos dos Santos et al., 2023) | Portuguese | USER | Self-disclosure | Binary | 18.8K users | FREE | https://drive.google.com/drive/folders/1MXFRs0u8iF1RNUWABTA0Oz8_Ix1skqZT | |
Alavijeh et al. (2023) | English | USER | Self-disclosure | Labels for multiple disorders | 1.5K users | FREE | https://github.com/szamani20/Twitter-Mental-Disorder-Dataset | |
Adarsh et al. (2023) | English | POST | Subreddit participation | Binary | 60K posts | UNK | ||
Cai et al. (2023) | Sina Weibo | Chinese | USER | Self-disclosure and manual annotation | Binary | 23K users | FREE | https://github.com/cyc21csri/SWDD |
Liu et al. (2023) | English | POST | Subreddit participation | Symptoms | 1.3M posts | FREE | https://github.com/devanshrj/depression-symptoms-reddit | |
BDI-Sen (PĂ©rez et al., 2023) | English | POST | Manual annotation | BDI-II symptoms | 4.9K posts | DUA | https://erisk.irlab.org/BDISen.html | |
SMHD-GER (Zanwar et al., 2023) | German | POST | Manual annotation | Labels for multiple disorders | 28K posts | DUA | ||
Song et al. (2023) | English | POST | Subreddit participation | Labels for multiple disorders | 85K posts | API | ||
RedditCE (Liang et al., 2023) | English | POST | Manual annotation | Emotion-cause labels | 35K posts | FREE | https://github.com/Liulei-nwpu/N2NCause | |
Ghosh et al. (2023) | Facebook, Twitter, YouTube | Bengali | POST | Manual annotation | Binary | 15K posts | AUTH | |
Li et al. (2023) | Sina Weibo | Chinese | USER | Self-disclosure, manual annotation | Binary | 4.8K users | UNK | |
Guo et al. (2023) | Sina Weibo | Chinese | USER | Manual annotation | Binary | 3.1K users | UNK | |
Liu et al. (2023) | Reddit, Twitter | English | USER | Self-disclosure | Binary | 205K users, 255 users | UNK | |
RESTORE (Yadav et al., 2023) | Reddit, Twitter, Pinterest | English | POST | Manual and automatic annotation | PHQ-9 symptoms | 9.8K images | AUTH | |
Zogan et al. (2023) | English | USER | Self-disclosure | Binary | 1.4K users | API | ||
Wu et al. (2023) | English | USER | Self-disclosure, manual annotation | Binary | 10K users | DUA | https://github.com/dragon-wu/depcov-www2023 | |
DepreSym (PĂ©rez et al., 2023) | English | POST | Manual annotation | BDI-II symptoms | 21K posts | DUA | https://erisk.irlab.org/depresym_dataset.html | |
Villa-PĂ©rez et al. (2023) | English, Spanish | USER | Self-disclosure | Labels for multiple disorders | 6K users | DUA | https://ieee-dataport.org/documents/twitter-dataset-mental-disorders-detection | |
HelaDepDet (Priyadarshana et al., 2023) | Twitter, Reddit | English | POST | Manual annotation | Depression severity | 40K posts | FREE | https://github.com/KUAS-ubicomp-lab/Depression_Severity_Levels_Dataset |
MentalRiskES (Mármol Romero et al., 2024)) | Telegram | Spanish | USER | Manual annotation | BIN + against/in-favour | 449 users | AUTH | https://github.com/sinai-uja/corpusMentalRiskES |
Alhamed et al. (2024) | English | USER | Manual annotation | Before/After diagnosis | 120 users | FREE | https://github.com/falwah-alhamed/Depression_Tweets/ | |
Milintsevich et al. (2024) | English | POST | Manual annotation | Anhedonia | 167 posts | DUA/ based on PRIMATE (Gupta et al., 2022) | https://github.com/501Good/primate-anhedonia https://huggingface.co/datasets/tartuNLP/reddit-anhedonia/ | |
MentalHelp (Raihan et al. (2024)) | English | POST | Automatic annotation | Binary | 14M posts | FREE | https://github.com/mraihan-gmu/MentalHelp | |
Lee at al. (2024) | English | USER | Manual annotation | Binary | 1K users | DUA | https://github.com/DSAIL-SKKU/Detecting-BD-from-Misdiagnosed-MDD_NAACL_2024 | |
Beniwal at al. (2024) | English | POST | Manual annotation | Binary | 10K posts | AUTH | ||
RMHD Rani at al. (2024) | English | POST | Manual annotation | Mental health causes | 800 posts | FREE | https://www.kaggle.com/datasets/entenam/reddit-mental-health-dataset | |
Tumaliuan at al. (2024) | English, Filipino | USER | PHQ-9 | Binary | 72 users | AUTH | ||
Anshul at al. (2024) | English | USER | Self-mention, Manual annotation | Binary | 1.5K users | FREE | only features: https://github.com/AshutoshAnshul/Depression-Detection | |
DepressionEmo (Rahman at al., 2024) | English | POST | Manual and automatic annotation | 8 emotions | 6k posts | FREE | https://github.com/abuBakarSiddiqurRahman/DepressionEmo |
For datasets published before 2017, please refer to https://github.com/kharrigian/mental-health-datasets.