Maluuba / qgen-workshop

ImplementAI Workshop on Deep NLP for Question Generation
MIT License
78 stars 25 forks source link

Bizarre questions generated. #15

Closed indrajithi closed 6 years ago

indrajithi commented 6 years ago

After training the model the questions generated are very bizarre. The model is trained and (test|train|dev).csv is generated as per the instructions. Can you tell me what might be the reason for this?

Question: what did obama obama about ?
Answer: conscience wo n’t

Question: who was the the ? ?
Answer: lionel messi

Question: how how many the the the ? ?
Answer: third

Question: who is the the ? ?
Answer: cristiano

Question: what what is the to to ?
Answer: rail

Question: what is the the to to to ?
Answer: millimeter wave

Question: what is the the to ?
Answer: deny

Question: who is is the to ? ?
Answer: qaeda

Question: what is the the ? ?
Answer: safe haven

Question: who what is the to to ? ? ? ?
Answer: taliban

Question: what what the the the to to ? ?
Answer: momentum

Question: how many many the the the ? ? ?
Answer: six

Question: who is the the ? ?
Answer: maurice jarre

Question: who was the the ? ?
Answer: trade center

Question: what did the the ? ?
Answer: little eichmanns

Question: what was the the ? ?
Answer: 30-year-old dolphin at sea world

Question: what was the the ? ?
Answer: colliding

Question: what was the the the ? ?
Answer: dolphin

Question: where was the the ? ?
Answer: discovery cove area of

Question: what what was the the ? ? ?
Answer: 83

Question: what is the the to ?
Answer: innocent

Question: who is the the to ? ?
Answer: renters

Question: how many many the the ? ? ?
Answer: foreclosed

Question: who is the the ? ?
Answer: criticized mortgage

Question: what is the the ? ?
Answer: mortgage meltdown

Question: what does the say say
Answer: many renters are being evicted for landlords '

Question: who was the the ? ?
Answer: ronnie
tavianator commented 6 years ago

Well, the main reason is that this is a toy model used to accompany an introductory presentation, not a production question generation model. However, I'm curious: if you apply the fix suggested in https://github.com/Maluuba/qgen-workshop/issues/3#issuecomment-383826450, do the questions get any better?

tavianator commented 6 years ago

I ran it for two epochs with that typo fixed and got

Question: what country is the most ?
Answer: international

Question: what is the most ?
Answer: oil supply

Question: what is the name ?
Answer: terror

which seems more coherent than before.