dropped es - Githubissues

SeedlingsBabylab / w2v_cosines

0 stars 0 forks source link

dropped es #1

Open ebergelson opened 7 years ago

ebergelson commented 7 years ago

hey, can you clarify why you dropped the final 'e' in juice, cookie, and bottle for the cosine calculations?

andreiamatuni commented 7 years ago

They're the stemmed/lemmatized versions of the words. That's the form of the training data in the CHILDES corpus. Using regular words results in lookup errors.

ebergelson commented 7 years ago

i don't really follow--why would only some of them have the final e dropped and not others? this may require digging into any documentation on the training data to clarify (and/or writing to them. by the way we should probably reference their code, no?)

andreiamatuni commented 7 years ago

I'll look into their documentation on their stemming procedure and send them an email. I'll also add references to where everything comes from.