alexgreene / WikiQuiz

Generates a quiz for a Wikipedia page using parts of speech and text chunking.
MIT License
803 stars 58 forks source link

Improve logic to redact question texts #13

Closed c-w closed 7 years ago

c-w commented 7 years ago

To give enough context for the answer, the redaction algorithm does not redact context words. However, this sometimes leads to nothing at all being redacted, e.g. for the question "an Earth year is about 365.26 years long", the correct answer "365.26" will be shortened to "36526" and then we'll fail to redact that token in the question text.

This problem was, for example, reported in #9. After applying this patch, the situation described in that issues is fixed:

image

This patch also fixes another failure mode related to context words: we may have a question like "the atmosphere is made up of ?% CO2" and one of the answers is "52%". That's too easy! After this change, the correct answer in this example would be redacted to "52" making the question less easy to answer.