UBC-MDS / olympic_medal_htest

MIT License
0 stars 10 forks source link

Create EDA for the Olympics dataset. #11

Closed stevenleung2018 closed 2 years ago

stevenleung2018 commented 2 years ago

.ipynb is the main source file. The md and the png files are for readability.

ming0701 commented 2 years ago

Thanks Steven. I understand some labels are not readable but I think this is fine for EDA and I can see there are some insights as mentioned in the documents. May I suggest adding one more graph showing the age distributions of all athletes? You may add it before the relationship between age and medals.

stevenleung2018 commented 2 years ago

Thanks for your suggestion. I just want to make sure that I understand it correctly. Do you mean just age distribution regardless of medals? So it is just one bigger histogram. But since the distributions across different medals are so similar, I think we are going to get a histogram of the same shape anyway. Or do you mean there should be a histogram age of ALL athletes before I remove those without medals?

stevenleung2018 commented 2 years ago

I have already added the histogram and committed all files again. Please take a look.

ming0701 commented 2 years ago

Yes, I mean age distribution of all athletes regardless of medals. To present a full picture, I think it would be useful to take a look on the distribution as we will need to use the no. of athletes without medals in calculating the probability for the hypothesis testing

stevenleung2018 commented 2 years ago

I like your idea. It's done. The overall distribution looks very similar to those with medals. So I guess in the subsequent milestones, we will try to reject the null hypothesis. It's really an open-ended question. Perfectly ok if we fail to reject. We will be able to explain ourselves either way.

ruben1dlg commented 2 years ago

Looks very good!