HackerSpace-PESU / deep-frost

A poem generator based on text generation using GRU
https://deep-frost.herokuapp.com/
10 stars 7 forks source link

Crashes when last word is not in vocabulary #12

Closed Samyak2 closed 3 years ago

Samyak2 commented 3 years ago

The Issue

I tested the app a bit and noticed that I get status 500: Internal Server Error on certain inputs.

Some of the inputs: Rhyme scheme: AABBAABB

First line: i walk a lonely askdh (crashes)

First line: i walk a askldhsal road (does NOT crash)

First line: I have much stonks (crashes)

Details

The error is: KeyError: "word 'askdh' not in vocabulary"

And occurs at line 61 here: https://github.com/HackerSpace-PESU/deep-frost/blob/dea2405b127c370831f6a846e4a2f54b3460a9d6/src/app.py#L60-L62

Probable Solution

One solution would be to wrap that line in a try-except (or use if word in word_vec.vocab:) and return an error to the front-end stating that the last word of the sentence could not be found in the vocabulary.

Another solution would be to skip that last word and consider the second last word, and if that fails, the third last word... and so on.

Ofcourse the best solution would be to use a character-based model or token-based (using a tokenizer such as PTB tokenizer) model, but that would need quite a bit of re-factoring and more resources for training.

Chakita commented 3 years ago

Will add a try-except block there. About skipping the last word, the last words in every sentence of the poem must rhyme according to the scheme provided..so skipping it might not work.. Yeah, a character-level model would not have the out of vocabulary issue, but it does require training a new model for it and also use character level embeddings instead of the Word2Vec used here. And with a character level model, finding rhyming words might be trickier.

Samyak2 commented 3 years ago

About skipping the last word, the last words in every sentence of the poem must rhyme according to the scheme provided..so skipping it might not work..

Right!

Yeah, a character-level model would not have the out of vocabulary issue, but it does require training a new model for it and also use character level embeddings instead of the Word2Vec used here. And with a character level model, finding rhyming words might be trickier.

That's true, it would be a lot more difficult.

Chakita commented 3 years ago

Fix Try except has been added now.

Samyak2 commented 3 years ago

That's great!