ben1605 / Soccer-match-outcome-prediction

0 stars 0 forks source link

Peer Review(my463) #2

Open PatrickYuanMingcan opened 6 years ago

PatrickYuanMingcan commented 6 years ago

You have done a good job! But there are 3 points need to improve

  1. When cleaning data, I think it would be better to replace the missing id with the latest information of the team with missing id, instead of delete it. For 4,605 samples is not a small part of total sample.
  2. It would be better if you can analyze the correlation in features. Even the dataset is big enough, it can't prevent overfitting. Random forest is a good method in decision tree, and you can apply PCA or LASSO in regression to select features.
  3. {-1, 0, 1} is a classification problem, you'd better use continuous variables in linear regression such as the odds of the game.