Nucleo-Estudantes-Informatica-ISEP / antirecurso

Wanna pass your exams? We gotcha! 😉
https://antirecurso.nei-isep.org
GNU General Public License v3.0
11 stars 7 forks source link

Repeated questions in same quiz #63

Open dimaguy opened 6 months ago

dimaguy commented 6 months ago

During quizzes, it's easy to find a colision(two questions being the same) between randomly selected questions from the question bank: image image

Even if the answers are ordered differently, this probably isn't expected behavior.

tomasflopes commented 5 months ago

The backend effectively prevents duplicated questions from appearing in the same quiz. In the scenario you are describing (and in similar cases), the question crawler retrieved the same question from different exams and stored it in the database. To address this, our proposed solution is to eliminate questions with identical content, including both the question and options. It's worth noting that this may not completely resolve the issue, as some variations (string matching issues) may still exist.

While we acknowledge this issue, we currently do not plan on fixing it in the near future. The reason is that addressing this problem would involve removing existing exam answers from old exams. Perhaps, in the future, as we undergo a database reset, we could consider addressing this issue.

We appreciate your feedback and encourage further suggestions or insights!

dimaguy commented 5 months ago

Instead of string matching, this case could probably benefit of string similarity algorithms to make some choices, this could be done as a cleanup task in the db after adding entries or on the fly mid quiz creation. Mind you that asking the same question in different ways is not inherently bad, as it tests if the student is choosing at random or actually knows what they are doing, but the same question with a high similarity coefficient is not as useful in the quiz

And instead of removing existing stuff from old exams, I propose soft-deletions, an extra column in the database that marks whether a question is eligible to be shown to the student or not (and only one instance of the question would show that), that way you'll be able to keep historical accuracy while providing a functional quiz

tomasflopes commented 5 months ago

Yes, using string similarity algorithms could potentially address the issue, but it wouldn't entirely eliminate the need for manual verification of each match. We are cautious about introducing a feature that might inadvertently remove valuable questions in the process. This is the primary reason for the prolonged delay in implementing this feature, as we are currently prioritizing improvements in other areas.

Regarding soft deletion we have the system in place, I mentioned the issue because the problem doesn't stem from database constraint issues, rather, it relates more to maintaining data coherence in history, stats, etc..

While we don't currently prioritize this issue, we have plans to address it eventually in order to prevent duplicated questions from being stored under the same subject.

dimaguy commented 5 months ago

Yes, using string similarity algorithms could potentially address the issue, but it wouldn't entirely eliminate the need for manual verification of each match. We are cautious about introducing a feature that might inadvertently remove valuable questions in the process. This is the primary reason for the prolonged delay in implementing this feature, as we are currently prioritizing improvements in other areas.

At this scale, there's not really a need to rule out manual verification of the output of any algorithm put in place, the point of getting something in place would be to raise awareness to someone with some free time at hand to address preemptively instead of waiting for reports to act on.

Regarding soft deletion we have the system in place, I mentioned the issue because the problem doesn't stem from database constraint issues, rather, it relates more to maintaining data coherence in history, stats, etc..

When I mentioned soft deletion, it wasn't quite in the sense of making them vanish entirely/retroactively, but rather just a mechanism to prevent them from showing in future quizzes (some boolean flag in the row that the function that's in charge of making up quizzes checks out) If it does matter the exact questions someone has answered even after their "deletion", there could be yet another column where the ID of the original question could be placed, so every hit on that duplicate question is redirected to the original one

While we don't currently prioritize this issue, we have plans to address it eventually in order to prevent duplicated questions from being stored under the same subject.

That's completely reasonable, and if you need help with anything, I should have some time available this next semester to give a hand