bmeaut / python_nlp_2018_spring

MIT License
8 stars 10 forks source link

Problem 3.2 #2

Open vashakidze opened 6 years ago

vashakidze commented 6 years ago

Please clarify few details about the third problem. 1) Are we supposed to obtain freq(ab) from the frequency dictionary of trig-rams? (should we sum all the occurrences of trig-rams that contain needed bi-grams?) 2)" If the generated text ends with a N−1 -gram that does not occur in the training data, generate the next character from the full character or ngram distribution. " Does it mean that, for example, if there is no freq(abcd) and therefore the way to compute freq(abcde)/freq(abcd), we should go step lower and try to find freq(abcd)/freq(abc) and so on ? (before we get to the uni-gram distribution?

I have read the previous issue about this topic, but still need more clarification (if possible).

juditacs commented 6 years ago
  1. Yes, you have to derive freq(ab) from the frequency dictionary.
  2. There are various solutions, the one you listed is a correct one as well.

I hope that clears it up.

vashakidze commented 6 years ago

It surely does. thank you very much