CODAIT / text-extensions-for-pandas

Natural language processing support for Pandas dataframes.
Apache License 2.0
215 stars 34 forks source link

Update Conll-U to fully support and cover EWT dataset #194

Closed ZachEichen closed 3 years ago

ZachEichen commented 3 years ago

Addresses issue #191 and adds support for importing CoNLL-U data-format files, especially those in EWT, Global Dependencies, and conll_2009 formats, as well as Ontonotes.

Created a separate method from the conll_to_df as a new entry-point for these dataformats, which supports similar options to other available packages supporting .conllu files (such as Spacy). Refactored common code within the io/conll module to separate methods.