ADAH-EviDENce / evidence

doc2vec-based assisted close reading with support for abstract concept-based search and context-based search
GNU General Public License v3.0
3 stars 1 forks source link

English demo dataset #47

Open sverhoeven opened 4 years ago

sverhoeven commented 4 years ago

As I a non-Dutch speaking researcher I would like to evaluate the tool, but the supplied demonstration dataset is in Dutch. Can a English demonstration dataset be added?

meiertgrootes commented 4 years ago

I guess, o, although I don't have one. Equally problematic in this regard, is that the UI dialogue is in dutch ...

sverhoeven commented 4 years ago

You are right, created https://github.com/ADAH-EviDENce/WorkingTitleCloseReader/issues/48 for UI.

sverhoeven commented 4 years ago

For training the framework/code/tokenizer.ipynb is now using Dutch stopwords. It should be replaced with a English stop word list.