brando90 / MathNet-large-scale-Mathematics-Dataset-for-Machine-Learning

1 stars 0 forks source link

Duplicate Q,A pair detection #34

Open brando90 opened 7 years ago

brando90 commented 7 years ago

Once we start generating data sets, it might be useful import it might be important to remove duplicate question answer pairs.

For example, if the user requests 100 examples per Q,A block, then it would be nice if one question didn't have enough variation encoded in it, that it doesn't generate the same question 100 times.

This might only apply for question with variations from a small set of names or only using permg etc.

If a numeric value that is not just a finite set, then this doesn't really apply for that specific question.