EleutherAI / pilev2

MIT License
13 stars 9 forks source link

Multiple Twitter datasets #16

Open upintheairsheep opened 1 year ago

upintheairsheep commented 1 year ago

Twitter is undoutably one of the most popular, if not THE most popular social media site for dataset creation due to it's nature, however Elon Musk is probably going to ruin it. See https://imerit.net/blog/top-25-twitter-datasets-for-natural-language-processing-and-machine-learning-all-pbm/ for the top 25 Twitter datasets, you could probably incorporate them all.

upintheairsheep commented 1 year ago

There's also a fairly sized (5 megabytes) Facebook dataset at https://www.kaggle.com/datasets/sheenabatra/facebook-data