DagsHub / open-source-ml-datasets

This repository holds open source datasets for various machine learning domains with a link to download and use them
https://dagshub.com/DagsHub/open-source-ml-datasets
8 stars 8 forks source link

All Scam Spam #55

Closed syedzubeen closed 10 months ago

syedzubeen commented 11 months ago

Dataset Details: This dataset comprises a substantial collection of 42,619 text messages and emails that have undergone preprocessing, originating from individuals conversing in 43 different languages. In this dataset, "is_spam=1" signifies spam, while "is_spam=0" indicates non-spam (ham).

A set of 1,040 rows of balanced data, encompassing casual conversations and fraudulent email communications in approximately 10 languages, was meticulously gathered and annotated by me, with some collaboration from ChatGPT.

Dataset URL: https://huggingface.co/datasets/FredZhang7/all-scam-spam

dagshub[bot] commented 11 months ago

Join the discussion on DagsHub!