Open ebergelson opened 7 years ago
They're the stemmed/lemmatized versions of the words. That's the form of the training data in the CHILDES corpus. Using regular words results in lookup errors.
i don't really follow--why would only some of them have the final e dropped and not others? this may require digging into any documentation on the training data to clarify (and/or writing to them. by the way we should probably reference their code, no?)
I'll look into their documentation on their stemming procedure and send them an email. I'll also add references to where everything comes from.
hey, can you clarify why you dropped the final 'e' in juice, cookie, and bottle for the cosine calculations?