Open HedvigS opened 8 years ago
You'll notice that in EnglishFeatureRegexps3.py
, Mark has defined ((af|pre|suf|in|circum)fix|(en|pro|circum|)?clitic|tone|(by)?(reduplication|suppletion))
as Morphology
; maybe we should use such variables in the spreadsheet.
What updating is needed, specifically (apart from filling in Dutch and Indonesian, for which we may need some help)? Or do we add more features (and if so, which)?
(I see you've discovered one can only have one assignee per issue :). Try the labels instead.)
I don't think we need to revise the already existing ones, we need to add more. Just give me a little time and I'll add them in.
@hjhaynie who is also a patron is willing to work on adding information to her features for these kinds of searches.
There's a patron meeting next week for Grambank, the intention is that after that a finalised set exists. It would actually be interesting include features no longer active in grambank when comparing humans and computers (for example old sahul) data, but for the aiding coders clearly the active are more important.
Would you like a deadline of sorts @tyrannomark and @skalyan91 when there is updated regexes on a certain amount of grambank features? Or are we exploring distributional semantics first?
I think we should focus on regexps first, do the more complex stuff later.
M
Agreed (in part because of the inertia in starting something new).
Ok, in that case we can start doing that for the new sets soon, because the deadline for every patron to send in their finalised sets to @d97hah is last of March.
starting with Hedvig’s patron set https://docs.google.com/spreadsheets/d/1k_6BuQbOYOTURIfcS5WGk4YeppyjzPbfHXtrXqZ-O5k/edit?usp=sharing