adeshpande3 / March-Madness-ML

Machine learned bracketology
190 stars 68 forks source link

Different Prediction 1v2 & 2v1 #7

Open Tplhardy opened 5 years ago

Tplhardy commented 5 years ago

I'm getting different predictions depending on whether which team I put in first and second position. Any fixes?

Tplhardy commented 5 years ago

My thought was to normalize each column in MM Data each year to avoid having to do a-b with negative numbers.

adeshpande3 commented 5 years ago

Hmm that's a good point you bring up, I didn't really check for that, but yeah I tried it myself and a couple times the results are similar, but sometimes there is quite a bit of difference.

I think it also kind of depends on what ML model you're using since they are each different in the ways that they handle high dimensional data. But yeah, I think normalizing would be a good first step to try.

At some level I'm not sure what we can do. We conceptually know these two training examples should be "equivalent" to each other. Example # 1: Input: 17 dimensional game vector X, Label: 1 Example # 2: Input: Negative of X, Label: 0

But the problem is that a machine learning model won't necessarily pick up on that. Wonder if there's a way to hard code that constraint. Not sure rn, but thanks for bringing it up!

Tplhardy commented 5 years ago

Thanks! So I tried absolute values, and it just resulted in some overfitting (was getting 99% accuracy and predictions with 99% probability). I think to solve for this in the short term (brackets due tomorrow) is, in case the two predictions are conflicting, take the difference of the two: