Add a new IMDB() dataset, built on the classic 2011 dataset commonly used for sentiment analysis model training.
Re-implement the Wikipedia() dataset, as child of a general-purpose Wikimedia() base class and sibling to a new Wikinews() dataset.
Uses a different kind of database dump provided by the Wikimedia foundation that offers cleaner data with significantly less processing and code, applicable over every language in which the data is available.
Update dataset docs and tests, and change some dataset default values and attributes that most users probably won't notice.
Motivation and Context
Continuing efforts to refine the datasets subpackage and make it more useful.
Types of changes
[x] Bug fix (non-breaking change which fixes an issue)
[x] New feature (non-breaking change which adds functionality)
[x] Breaking change (fix or feature that would cause existing functionality to change)
Checklist:
[x] My code follows the code style of this project.
[x] My change requires a change to the documentation, and I have updated it accordingly.
Description
IMDB()
dataset, built on the classic 2011 dataset commonly used for sentiment analysis model training.Wikipedia()
dataset, as child of a general-purposeWikimedia()
base class and sibling to a newWikinews()
dataset.Motivation and Context
Continuing efforts to refine the
datasets
subpackage and make it more useful.Types of changes
Checklist: