Closed mdfwn closed 7 years ago
I confirm that it is an inconsistency between those two. However, it has no impact in the prediction value.
To sum it up:
However, I agree that symmetry should be achieved, so I will update the .ipynb by fixing this issue and introducing comments. Thanks a lot for observing the issue!
I'm not entirely convinced yet. If your y_train encodes that y_train[:,0] = chance of radiant_win and y_train[:,1] = chance of dire_win (second column), how does result[0][1] (second column) encode the chance of radiant_win? Sorry if this is obvious, I just don't see it.
I realize that I failed to explain properly. What I meant was that the notebook and the rest of the project have nothing to do with each other.
You can consider the query.py encoding the right version with result[0][1] meaning the chance of radiant_win, and the one from the notebook the wrong version. However, since in the notebook we don't make queries regarding radiant/dire, it has no impact on the accuracy.
I will fix the notebook anyway so further confusion is avoided. I hope I was clear enough this time, but if I was not, feel free to ask.
Basically, at the moment, logistic regression (query.py) predicts [dire_chance, radiant_chance] and the notebook predicts [radiant_chance, dire_chance].
Oh alright now I understand. I used the notebook code to create x_train and y_train for my model, but you created it differently for your model, I assume, so for you it is consistent. Thanks for clearing it up.
I added two better explained IPython notebooks in the "experiments" folder. The one hot encoding was removed such that there is no more confusion.
Something that I've noticed is that in the augment_one_hot.py you do
which means we will have [0,1] for a dire win and [1,0] for a radiant win. Later on in the NeuralNet.ipynb we fill the y_train and y_test accordingly with 1 in the first column if it is a Radiant win and 1 in the second column if it is a dire win.
However, moving on to query.py:
the FIRST column is now interpreted as dire winrate and the SECOND column as radiant winrate.
Am I overlooking or confusing something or is this indeed a confusion of indices?