bmeaut / python_nlp_2018_spring

MIT License
8 stars 10 forks source link

Homework 2 Viterbi Task, Bad Assert #6

Open pcsiszar opened 6 years ago

pcsiszar commented 6 years ago

When asserting for the POS tags of the sentence: " The cat walks. " the assert statement flags the as NNP

quoted from homework 2

tags = run_viterbi("The cat runs .") print(tags) assert tags == ['NNP', 'NN', 'VBZ', '.']

In actuality, the "The" part is supposed to be a DT, determiner, not an NNP, a proper noun. I also checked this with spacy, it came up with the same result: code: sent = "The cat walks." s = nlp(sent) for token in s: print(token.pos_)

Results:

DET NOUN VERB PUNCT

Thank you for taking a look at this ticket

juditacs commented 6 years ago

No system is perfect and the Viterbi you train on this data will produce some errors, especially on small datasets like this one.

The tagset we showed you during class is a simplified one and in practice more fine-grained tagsets are used (such as the one in the training data):

pcsiszar commented 6 years ago

I'm not sure I understand...

Does that mean, that if we code the viterbi correct, it will pos tag the asserted sentence wrong?

Because mine does flag "the" at the start as a determiner. Based on the 1m word dataset. Because of that, it won't pass the test.

Judit Acs notifications@github.com ezt írta (időpont: 2018. ápr. 2., H 10:17):

No system is perfect and the Viterbi you train on this data will produce some errors, especially on small datasets like this one.

The tagset we showed you during class is a simplified one and in practice more fine-grained tagsets are used (such as the one in the training data):

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bmeaut/python_nlp_2018_spring/issues/6#issuecomment-377882192, or mute the thread https://github.com/notifications/unsubscribe-auth/Ah3aP69ynW_YZr0TUPnQYxrjmYbnuNRMks5tkd6fgaJpZM4TDAzT .

juditacs commented 6 years ago

That does sound suspicious, there might be a bug in my solution. I'll look into it tomorrow.

Thank you for the notification.

juditacs commented 6 years ago

BTW your output is not actually DET and NOUN, right? What is you exact output for all of the sentences in the example?

juditacs commented 6 years ago

Ok, I think I found the error. I was using int16 and my count matrices overflew. This only affects the very last test.

I will think of a solution to replace the tests. It might involve you (and the other early workers) having to copy-paste your current solutions into a new notebook.

juditacs commented 6 years ago

I corrected the test, uploaded the new version. Please let me know if you still experience problems with it.

pcsiszar commented 6 years ago

My output was : DT NN VBZ .

On Tue, Apr 3, 2018 at 4:09 PM, Judit Acs notifications@github.com wrote:

I corrected the test, uploaded the new version. Please let me know if you still experience problems with it.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bmeaut/python_nlp_2018_spring/issues/6#issuecomment-378263243, or mute the thread https://github.com/notifications/unsubscribe-auth/Ah3aP-DeFM14me03Lf_WoaAqky0uM_Xyks5tk4KogaJpZM4TDAzT .