Closed Samyak2 closed 3 years ago
Will add a try-except block there. About skipping the last word, the last words in every sentence of the poem must rhyme according to the scheme provided..so skipping it might not work.. Yeah, a character-level model would not have the out of vocabulary issue, but it does require training a new model for it and also use character level embeddings instead of the Word2Vec used here. And with a character level model, finding rhyming words might be trickier.
About skipping the last word, the last words in every sentence of the poem must rhyme according to the scheme provided..so skipping it might not work..
Right!
Yeah, a character-level model would not have the out of vocabulary issue, but it does require training a new model for it and also use character level embeddings instead of the Word2Vec used here. And with a character level model, finding rhyming words might be trickier.
That's true, it would be a lot more difficult.
Try except has been added now.
That's great!
The Issue
I tested the app a bit and noticed that I get status 500:
Internal Server Error
on certain inputs.Some of the inputs: Rhyme scheme:
AABBAABB
First line:
i walk a lonely askdh
(crashes)First line:
i walk a askldhsal road
(does NOT crash)First line:
I have much stonks
(crashes)Details
The error is:
KeyError: "word 'askdh' not in vocabulary"
And occurs at line 61 here: https://github.com/HackerSpace-PESU/deep-frost/blob/dea2405b127c370831f6a846e4a2f54b3460a9d6/src/app.py#L60-L62
Probable Solution
One solution would be to wrap that line in a try-except (or use
if word in word_vec.vocab:
) and return an error to the front-end stating that the last word of the sentence could not be found in the vocabulary.Another solution would be to skip that last word and consider the second last word, and if that fails, the third last word... and so on.
Ofcourse the best solution would be to use a character-based model or token-based (using a tokenizer such as PTB tokenizer) model, but that would need quite a bit of re-factoring and more resources for training.