bigscience-workshop / data_tooling

Tools for managing datasets for governance and training.
Apache License 2.0
77 stars 48 forks source link

Create dataset india_news_headlines_dataset #144

Closed albertvillanova closed 2 years ago

albertvillanova commented 2 years ago
albertvillanova commented 2 years ago

Please note that although this dataset is already available at https://huggingface.co/datasets/times_of_india_news_headlines, it requires MANUAL download.

albertvillanova commented 2 years ago

DONE: https://huggingface.co/datasets/bigscience-catalogue-data/india_news_headlines_dataset

Example:

{'publish_date': 20010102,
 'headline_category': 'unknown',
 'headline_text': 'Status quo will not be disturbed at Ayodhya; says Vajpayee'}
albertvillanova commented 2 years ago

DONE: https://huggingface.co/datasets/bigscience-catalogue-lm-data/lm_en_india_news_headlines_dataset


{'text': 'Status quo will not be disturbed at Ayodhya; says Vajpayee'}