CODAIT / text-extensions-for-pandas

Natural language processing support for Pandas dataframes.
Apache License 2.0
217 stars 34 forks source link

Add support for Watson Knowledge Studio document collection format #210

Open frreiss opened 3 years ago

frreiss commented 3 years ago

Watson Knowledge Studio (https://cloud.ibm.com/docs/knowledge-studio?topic=knowledge-studio-wks_overview_full) can export annotated document collections (also known as "ground truth") as a collection of JSON files. We should add functions under text_extensions_for_pandas.io.watson to import collections in this format into DataFrames.

We'll need to track down some technical documentation on the file format to make this import functionality robust.