Closed jnederlo closed 4 years ago
I'm more partial to a predicting/estimating problem. Some of my ideas are:
With the sports data, I would want to make a player classification system, and/or things like predicting some of their stats for upcoming games. For basketball specifically I would like to make a prediction model to predict the players on court minutes in the upcoming game. Both the NHL.com and NBA.com have an accessible API, and there are lots of data sets.
I'm open to other datasets though if somebody had a good idea.
Thanks, Jarvis@jnederlo for giving this good kick-start. Personally, I'd prefer a predicting/estimating problem as well. And the "players on court minutes" question sounds interesting to me. Besides that, I'm also looking into a financial dataset, which is to build a model to predict whether the clients will repay their loans. The problem is it's a Kaggle dataset. I'll check with the instructors if that's a proper choice of data resource at all before we see it as an option.
@Zhang-Haipeng That's not a bad idea either. I know how to use the Kaggle API to get datasets programmatically if it's of any use.
Great ideas! Personally I would be more interested in a financial one (predicting loan repayments sounds pretty applicable to future employment). Otherwise, predicting NBA court times would be an amusing topic. So ya, either works for me!
I checked with Firas and it seems there's no license
in the Kaggle dataset that I was looking into. So it might not be an option for us.
Also, I agree with Jack that it's applicable to employment. Credit Scoring is one of the major machine learning tasks in the financial industry IMO. So maybe I'll take some time to see if I can find other similar datasets tonight.
But again, I'm totally fine with Jarvis' proposal. So if you guys want to just do it, it'll be all good with me.
One question tho, isn't it more like a problem that requires regression? And more specifically I might see it as a time series analysis (which is not really covered yet in this program), where I want to use some autoregressive models to make predictions on the future court minutes using historical court minutes.
Final note, fivethirtyeight has good and clean datasets. @Zhang-Haipeng regarding the loan repayment dataset, I would want to make sure the data preparation step isn't too large, that would be my only concern.
Let's focus on sports then. I've tried several datasets. They are either without a license or too heavy to work with. @jnederlo Do we have any specific dataset in mind?
TO DO:
To start the discussion off, I think we can split ideas up into two general categories: