Create example notebook with English Web Treebank data set

Create an example notebook that shows analyzing the EWT data set

Major steps:

Download the data set from https://github.com/UniversalDependencies/UD_English-EWT
Read the dataset into DataFrames
Write entire data set to a Feather file and read back in
Display a parse tree
Retokenize with a BERT subword tokenizer
Show reconstructing a sentence's span using group by and aggregation
Run document text through the Stanza EWT dependency parser (https://stanfordnlp.github.io/stanza/available_models.html) and compare the outputs against the gold standard. Or alternately use SpaCy's parser, with the caveat that it's trained on OntoNotes which has a slightly different schema.

CODAIT / text-extensions-for-pandas