Open StephanAkkerman opened 8 months ago
Not used: https://sobigdata.d4science.org/catalogue-sobigdata?path=/dataset/crypto_related_tweets_from_10_10_2020_to_3_3_2021 -> very big (March alone is 23GB) https://zenodo.org/records/3895021 -> only contains Tweet IDs
Combined pre-training dataset available on: https://huggingface.co/datasets/StephanAkkerman/crypto-stock-tweets
Look on https://hf.co/datasets for more useful datasets
Unlabeled:
https://www.kaggle.com/datasets/johnyleebrown/twitter-parsed-cryptocurrencies-data/data -> crypto tweets (not used: truncated tweets + inconsistent format)Twitter sentiment datasets (similar to tweet-eval):
Labeled: