Closed tsvm closed 2 months ago
Yes, I recall this. Do you see an easy fix?
On Tue, Jul 16, 2019, 4:04 PM Tsvetomila Mihaylova notifications@github.com wrote:
The read_conll_instances method in pos_corpus.py is adding more words than are actually used, as it seems to be adding the words before checking whether the length of the sentence is within the limit. This is not a big problem, but students were trying to understand the word features and it was a little confusing why there were words that do not belong to either train, dev or test.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/LxMLS/lxmls-toolkit/issues/161?email_source=notifications&email_token=AAK3OCOOZ5AMXJUGCGBDL33P7XPPHA5CNFSM4IEB6YR2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G7P5USA, or mute the thread https://github.com/notifications/unsubscribe-auth/AAK3OCM44QGJPLF5CACMOEDP7XPPHANCNFSM4IEB6YRQ .
As far as I can tell, this has never been fixed. However, it doesn't seem critical to anything going on now.
The
read_conll_instances
method inpos_corpus.py
is adding more words than are actually used, as it seems to be adding the words before checking whether the length of the sentence is within the limit. This is not a big problem, but students were trying to understand the word features and it was a little confusing why there were words that do not belong to either train, dev or test.