Does the network output [Dire_Winchance, Radiant_Winchance] or [Radiant_Winchance, Dire_Winchance]?

andreiapostoae / dota2-predictor

Tool that predicts the outcome of a Dota 2 game using Machine Learning

MIT License

367 stars 82 forks source link

Does the network output [Dire_Winchance, Radiant_Winchance] or [Radiant_Winchance, Dire_Winchance]? #7

Closed mdfwn closed 7 years ago

mdfwn commented 7 years ago

Something that I've noticed is that in the augment_one_hot.py you do

if row[1] == '0':
    new_row.extend([0, 1])
else:
    new_row.extend([1, 0])

which means we will have [0,1] for a dire win and [1,0] for a radiant win. Later on in the NeuralNet.ipynb we fill the y_train and y_test accordingly with 1 in the first column if it is a Radiant win and 1 in the second column if it is a dire win.

However, moving on to query.py:

if faction == 'Radiant':
    probabilities_dict[i] = result[0][1] * 100
    query_list.pop(0)
else:
    probabilities_dict[i] = result[0][0] * 100
    del query_list[-1]

the FIRST column is now interpreted as dire winrate and the SECOND column as radiant winrate.

Am I overlooking or confusing something or is this indeed a confusion of indices?

andreiapostoae commented 7 years ago

I confirm that it is an inconsistency between those two. However, it has no impact in the prediction value.

To sum it up:

query.py: result[0][1] = probability of the 1 class to be true (radiant_win) - it's correct
augment_one_hot.py: strictly speaking, as I do not make queries, but only check the accuracy, how I encode the states is irrelevant

However, I agree that symmetry should be achieved, so I will update the .ipynb by fixing this issue and introducing comments. Thanks a lot for observing the issue!

mdfwn commented 7 years ago

I'm not entirely convinced yet. If your y_train encodes that y_train[:,0] = chance of radiant_win and y_train[:,1] = chance of dire_win (second column), how does result[0][1] (second column) encode the chance of radiant_win? Sorry if this is obvious, I just don't see it.

andreiapostoae commented 7 years ago

I realize that I failed to explain properly. What I meant was that the notebook and the rest of the project have nothing to do with each other.

You can consider the query.py encoding the right version with result[0][1] meaning the chance of radiant_win, and the one from the notebook the wrong version. However, since in the notebook we don't make queries regarding radiant/dire, it has no impact on the accuracy.

I will fix the notebook anyway so further confusion is avoided. I hope I was clear enough this time, but if I was not, feel free to ask.

andreiapostoae commented 7 years ago

Basically, at the moment, logistic regression (query.py) predicts [dire_chance, radiant_chance] and the notebook predicts [radiant_chance, dire_chance].

mdfwn commented 7 years ago

Oh alright now I understand. I used the notebook code to create x_train and y_train for my model, but you created it differently for your model, I assume, so for you it is consistent. Thanks for clearing it up.

andreiapostoae commented 7 years ago

I added two better explained IPython notebooks in the "experiments" folder. The one hot encoding was removed such that there is no more confusion.