F/dumpbackslash - Githubissues

nadvornix commented 8 years ago

I had trouble with regenerating synthetic questions - Most of questions were same, but it produced about 20 questions differently (I do not know why). I made it so it doesn't matter: scripts that using several datasets (train.json, questionDump, tsv file...) will just make intersection of these datasets and use questions that are in all of them. (This is what, in commit messages, I call "robustness").

Bad thing about that is that it is not clear what dataset is really used, etc.

Bug in question dumper should be fixed now.

pasky commented 8 years ago

Thanks for the great work! I'm looking forward to testing the end-to-end accuracy improvement. (It'd be best if you could do it - please send me your public ssh key. Do you have IPv6 connectivity?)

I think you are right to have doubts about the "more robust" concept training, though - I think we should fix this properly instead of just covering up the issue. It is important from a scientific point of view to be very clear about what dataset we trained a component on, and we don't seem to be clear on what questions it does (not) include now. Let's try to address the underlying issue and drop that change, do you agree?

pasky commented 8 years ago

Thanks, merged this and added some followup commits that revert the robustness and migrate to moviesF. Whew, this was sure a lot of crazy commit-shuffling!!

brmson / yodaqa

F/dumpbackslash #24