allenai / deep_qa

A deep NLP library, based on Keras / tf, focused on question answering (but useful for other NLP too)
Apache License 2.0
404 stars 132 forks source link

Wrote WhoDidWhat (WDW) reader to get the dataset into the pipeline #137

Closed nelson-liu closed 7 years ago

nelson-liu commented 7 years ago

This PR adds a WDWDatasetReader and supporting classes to the pipeline, as well as unit tests.

I never used Scala before Tuesday, so let me know if any code i've written isn't idiomatic or there's a better way to do it.

The WhoDidWhat Dataset is a cloze dataset where the task is to predict an entity in a sentence (the "question", which is the concatenation of the "left context" and "right context" which refer to the text on either side of the blank to fill in) given an article / passage on the same subject as the sentence and several answer choices.