andmon97 / ATPTennisMatchPredictions

Tennis Match Predictions using Machine Learning
4 stars 2 forks source link

Bias about input data #1

Open nabbone17 opened 1 month ago

nabbone17 commented 1 month ago

Hi, i was genuinly interested about your project, i tried to replicate it but for me there is a bias, you use

<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

w_ace | w_df | w_svpt | w_1stIn | w_1stWon -- | -- | -- | -- | --

and other but these data are about the match played, you're using the result to predict the result...

nabbone17 commented 1 month ago

For example:

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

tourney_id | tourney_name | surface | draw_size | tourney_level | tourney_date | match_num | Player1_entry | Player1_name | Player1_hand | Player1_ht | Player1_ioc | Player1_age | score | best_of | round | minutes | Player2_id | Player2_seed | Player2_entry | Player2_name | Player2_hand -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- 2018-0352 | Paris Masters | Hard | 64 | M | 20181029 | 270 |   | Novak Djokovic | R | 188.0 | SRB | ######## | 7-5 6-1 | 3 | R32 | 93.0 | 105311 |   | Q | Joao Sousa | R

Djokovic has 6 aces for feature value but, you can't know how many aces he do before he play then, you're using result to predict Result. https://www.diretta.it/partita/vVyp7Uqf/#/informazioni-partita/statistiche-partite/0