All Scam Spam - Githubissues

Dataset Details: This dataset comprises a substantial collection of 42,619 text messages and emails that have undergone preprocessing, originating from individuals conversing in 43 different languages. In this dataset, "is_spam=1" signifies spam, while "is_spam=0" indicates non-spam (ham).

A set of 1,040 rows of balanced data, encompassing casual conversations and fraudulent email communications in approximately 10 languages, was meticulously gathered and annotated by me, with some collaboration from ChatGPT.

Dataset URL: https://huggingface.co/datasets/FredZhang7/all-scam-spam

DagsHub / open-source-ml-datasets

All Scam Spam #55