TheGadflyProject / TheGadflyProject

The core NLP library for automatic question generation
17 stars 7 forks source link

levels of hierarchy in MCQ answer choices #43

Open danielsgriffin opened 8 years ago

danielsgriffin commented 8 years ago

Currently, dead-giveways occur: All of this is of little comfort to Fitri Amailia, a 40-year-old agent with a building contractor whose normal commute to Jakarta from her home in Bogor, a bedroom community in ___________ , by commuter train and bus takes her two and a half hours each way. Answer choices: West Java Province, South Jakarta, Bandung, Indonesia ^ there are plausibly two correct answers

When the city of ___________ signed on with SunEdison last year, city leaders announced they had taken the final step toward a power supply entirely independent of fossil fuels. Answer choices: Georgetown, Utah, Hawaii, Texas

This is particularly difficult. Currently solutions exist in MCQ for GPEs such as US states, countries, and continents, but not cities. I am not sure what the best way forward is. See data_pickler.py (where I create a pickle from lists of US states and countries, using spaCy to find similar entities.

Update to do: return a longer list (perhaps ten, and select three from random.shuffle instead of only the consistent three "most similar"

danielsgriffin commented 8 years ago

Update from above has been fixed. Previously the value for the key was the three most similar w/ alt_choices = alts[:] in heuristic_evaluator.py whereas now the value is the nine most similar w/:

alt_choices = alts[:]
random.shuffle(alt_choices)
alt_choices = alt_choices[:3]