Open TimkLee opened 2 years ago
Notes:
Different model types such as neural networks, decision trees, clustering, etc.
More advanced feature selection strategies
Applying PCA (https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html?highlight=pca#sklearn.decomposition.PCA.fit)
SelectKBest (https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectKBest.html)
SelectPercentile (https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectPercentile.html)
Fisher's Score (https://github.com/jundongl/scikit-feature/blob/master/skfeature/function/similarity_based/fisher_score.py)
sklearn.feature_selection methods (https://scikit-learn.org/stable/modules/feature_selection.html#recursive-feature-elimination)
More thoughtful splitting of the training data (For example, motivated by the fact that teams often start completely healthy but then lose players due to injury as the season progresses, you could omit the first 20 games per team, or select the training/validation sets by alternating games.)
If the characteristic of the regular and post season games are inherently different or skewed. It would be interesting to train a model using regular season games first then test using post season games and vice versa.
Alternatively, separate the regular and the post season games first, then use the shoe strap method to generate as much post season games as the regular games before randomly selecting the training and validation sets.
Exploring applying a bootstrap strategy to increase the number of goal events, creating a more balanced dataset.
Hyperparameter tuning, cross validation strategies
Regularization