BrandoPolistirolo / Tennis-Betting-ML

Machine Learning model(specifically log-regression with stochastic gradient descent) for tennis matches prediction. Achieves accuracy of 66% on approx. 125000 matches
MIT License
29 stars 8 forks source link

Hoping for a nudge in the right direction #6

Open ey5dimes opened 3 months ago

ey5dimes commented 3 months ago

HI Brando,

On the off chance that you still check this repo and follow it, I figured id ask and see if you could get me unstuck. Unlike some of the other inquiries, ive gotten a bit farther into getting it to work, but have run into some issues with the data preprocessing python file.

When the file gets to about k=1777 in the loop, I get an out of bounds error when merging the temp dataframe and the final dataframe. Ive tried printing out the variables and think it might be related to the player_id not matching in the two df's? Here is the output of the Jupyter notebook.

Any help is appreciated, can't get it to complete this loop.

Screenshot 2024-08-27 at 6 27 53 PM
ey5dimes commented 3 months ago

Update on progress.

In my previous post, I was running the entire "KaggleTennis.....py" file within Jupyter by just doing 'run Kaggle_Tennis... blah'

This works, however b/c of the out of bounds error above, it would fail and exit. So this time around, I just set up each of the "In[] as its equivalent in a Jupyter notebook. Then once the loop failed at the same point (for which I still can't figure out why) I could still run the rest of the "IN[]" and get it to work. This lets me create all of the tournament css's, the 'final_kaggle_dataset.csv' files etc.

However, now I'm trying to figure out which file to run next. I think it's either Players_Name_Fix.py or Players_Data_PreProc.py. In the PreProc file, the first few lines read the file 'atp_players.csv' which is not a file within the dataset. Im wondering if this should be changed to the matching file 'all_players.csv' which IS in the dataset. Will try this and report back.