LxMLS / lxmls-toolkit

Machine Learning applied to Natural Language Processing Toolkit used in the Lisbon Machine Learning Summer School
Other
222 stars 216 forks source link

Dictionary in pos_corpus.py #161

Closed tsvm closed 2 months ago

tsvm commented 5 years ago

The read_conll_instances method in pos_corpus.py is adding more words than are actually used, as it seems to be adding the words before checking whether the length of the sentence is within the limit. This is not a big problem, but students were trying to understand the word features and it was a little confusing why there were words that do not belong to either train, dev or test.

ramon-astudillo commented 5 years ago

Yes, I recall this. Do you see an easy fix?

On Tue, Jul 16, 2019, 4:04 PM Tsvetomila Mihaylova notifications@github.com wrote:

The read_conll_instances method in pos_corpus.py is adding more words than are actually used, as it seems to be adding the words before checking whether the length of the sentence is within the limit. This is not a big problem, but students were trying to understand the word features and it was a little confusing why there were words that do not belong to either train, dev or test.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/LxMLS/lxmls-toolkit/issues/161?email_source=notifications&email_token=AAK3OCOOZ5AMXJUGCGBDL33P7XPPHA5CNFSM4IEB6YR2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G7P5USA, or mute the thread https://github.com/notifications/unsubscribe-auth/AAK3OCM44QGJPLF5CACMOEDP7XPPHANCNFSM4IEB6YRQ .

bpopeters commented 2 months ago

As far as I can tell, this has never been fixed. However, it doesn't seem critical to anything going on now.