VabGMos / UFC-fight-predictor

Predicts the winner of UFC fights
1 stars 0 forks source link

Future data leaking into model training #1

Open fro391 opened 4 years ago

fro391 commented 4 years ago

Hello!

I really like the idea you guys had for this project.

One thing I noticed was when you are training your model in the "Predict Fights" notebook, you are taking each fighters' current stats and applying them as features to their old fights. For example, a fighter's current Significant Strikes landed percentage may not have been as high in their early career fights, but they were used as features for their early career fight results. This leads to future data leaking into the model training process, which will overstate the model's accuracy. However, some stats such as height, reach and age are static, so they are not future features.

Curious if you have any plans to adjust the training method to get around this issue.

Thanks,

Richard

VabGMos commented 4 years ago

Hi Richard,

Thanks for looking into our project and the insight. We actually knew we should not have used fighter's current stats in our model but this was actually a class project for a data science course and we were pressed on time, so we could not make it perfect. This was an intro to data science course so we actually did not know this would overstate the model's accuracy which is a bummer. We did not really learn any machine learning, it was touched on at the end of the course and we figured out how to plug and chug data in a model. Our plan initially was to represent fighters as objects, where stats would be there attributes, and one of the attributes could be a list of fighter's stats at every year. We think this could take care of the problem, but it seems a little complicated.

I do not know if we will work on this further, I have a busy semester ahead, as I am doing some research and applying for co ops. But we want to create a web app maybe that could run our program and display the program in a nice visualization. If we do get time, we do plan on working on this and making the project bigger. I know minimal python right now as displayed in the project, but hopefully we can learn how to create a graphical interface and maybe write a script to re-scrape all the data keep this project up to date.

Please let me know if you have any suggestions and insights. Also, I was wondering how you found this project?

Regards, Vab

On Wed, 25 Dec 2019 at 23:19, fro391 notifications@github.com wrote:

Hello!

I really like the idea you guys had for this project.

One thing I noticed was when you are training your model in the "Predict Fights" notebook, you are taking each fighters' current stats and applying them as features to their old fights. For example, a fighter's current Significant Strikes landed percentage may not have been as high in their early career fights, but they were used as features for their early career fight results. This leads to future data leaking into the model training process, which will overstate the model's accuracy. However, some stats such as height, reach and age are static, so they are not future features.

Curious if you have any plans to adjust the training method to get around this issue.

Thanks,

Richard

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/VabGMos/UFC-fight-predictor/issues/1?email_source=notifications&email_token=AIVAJCEL2OFABIFXHEFX7UDQ2OMJZA5CNFSM4J7F5RD2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4ICT5RAQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIVAJCGASYM5IQ62JVCQIR3Q2OMJZANCNFSM4J7F5RDQ .

fro391 commented 4 years ago

I found your work through the search function on GitHub. I was trying to build my own fight result predictor, and was searching GitHub for inspiration.

I am an active speculator in MMA, and believe that leveraging machine learning techniques could give an edge in predicting fight outcomes, and generate positive returns over the long run with an optimal betting strategy.

If you have some free time in the future and would like to collaborate, feel free to reach out to me on LinkedIn

Cheers,

Richard