Tidy up names, tests, docs, API

Description

Rename certain functions and variables for clarity and consistency with existing conventions
- textacy.load_spacy() => textacy.load_spacy_lang()
- textacy.extract.named_entities() => textacy.extract.entities(), with ne to ent internally
- textacy.data_dir => textacy.DEFAULT_DATA_DIR
- filename => filepath and dirname => dirpath when specifying full paths to files/dirs on disk, and textacy.io.utils.get_filenames() => textacy.io.utils.get_filepaths()
- compiled regular expressions start with RE_ instead of ending with _RE, using REGEX, etc.
- SpacyDoc to Doc, SpacySpan to Span, SpacyToken to Token, SpacyLang to Language as variables and in docs
Remove some deprecated functionality, as planned
- top-level spacy_utils.py and spacy_pipelines.py are gone; use spacier subpackage instead
- textacy.compat.bytes_to_unicode() and textacy.compat.unicode_to_bytes() are gone; use textacy.compat.to_unicode() and textacy.compat.to_bytes() instead
- ftfy dependency is dropped, and a NotImplementedError is raised in textacy's wrapper function, textacy.preprocess.fix_bad_unicode(). (Note: There wasn't any deprecation warning, but since the solution is to replace the call with an equivalent but more powerful call to ftfy.fix_text(), I opted to bundle this in with all these other changes. Sorry, folks!)
Move and rename textacy.text_utils.detect_language() => textacy.lang_utils.detect_lang(), where additional lang-related functionality can get added in the future
Add functionality to finish up recently implemented features
- add textacy.spacier.doc_extensions.get_extensions() function to go with set_extensions() and remove_extensions(); it provides a slightly nicer interface over spaCy's current functionality.
- add newer datasets (textacy.datasets.IMDB and textacy.datasets.Wikinews) into textacy's CLI so users can download and inspect them, too
- add textacy.Corpus.word_counts() and textacy.Corpus.word_doc_counts(), which were punted on during the recent overhaul of the Corpus class (Note: The names have changed, from *_freqs() to *_counts().)
Add and refactor many tests, for both new and old functionality, significantly increasing test coverage

Motivation and Context

This is some much-needed spring cleaning for textacy! Consistently following both internal and external naming conventions should reduce user confusion; improving the test suite means that errors are more likely to be caught; better factoring functionality makes the code more maintainable.

How Has This Been Tested?

passes all the tests, and then some

Types of changes

[x] Bug fix (non-breaking change which fixes an issue)
[x] New feature (non-breaking change which adds functionality)
[x] Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

[x] My code follows the code style of this project.
[x] My change requires a change to the documentation, and I have updated it accordingly.

chartbeat-labs / textacy

Tidy up names, tests, docs, API #240

Description

Motivation and Context

How Has This Been Tested?

Types of changes

Checklist: