Closed Eumaeus closed 7 years ago
This needs a more serious design review to consider how best to organize a Corpus DSL.
Moving this to new milestone for DSL redesign
Implemented, including new short-hand functions findWSTokens
that matches on "white-space delimited" tokenization and findWordTokens
that matches on "word" tokens (white-space delimited, ignoring punctuation)
Testing on Vector("Gyges","Ardys") fails, because the text has "Ardys the son of Gyges,"
This is absolutely correct according to the documentation, which specifies white space as a delimiter, but it will confuse people.