UBC-MDS / data-analysis-review-2023

0 stars 0 forks source link

Submission: Group 17: Fifa-potential #21

Open jbarns14 opened 7 months ago

jbarns14 commented 7 months ago

Submitting authors: @srfrew @meretelutz @jbarns14 @WaleedMahmood1

Repository: https://github.com/UBC-MDS/fifa-potential Report link: https://ubc-mds.github.io/fifa-potential/high-potential-fifa-prediction-report.html Abstract/executive summary: We attempt to construct a classification model using an RBF SVM classifier algorithm which uses FIFA22 player attribute ratings to classify players’ potential with target classes “Low”, “Medium”, “Good”, and “Great”. The classes are split on the quartiles of the distribution of the FIFA22 potential ratings. Our model performed reasonably well on the test data with an accuracy score of 0.809, with hyperparamters C: 100.0 & Gamma: 0.010. However, we believe there is still significant room for improvement before the model is ready to be utilized by soccer clubs and coaching staffs to predict the potential of players on the field instead of on the screen.

Editor: @ttimbers Reviewer: Karan Khubdikar, Sandra Gross, Nicole Tu, and Jordan Cairns

Nicole-Tu97 commented 7 months ago

Reviewer: Nicole-Tu97

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 1.5 hours

Review Comments:

Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.

  1. The report is very well-structured and explains the model in detail. However, the report can be more comprehensible to people with non-technical background if you can use less technical terminologies and explain the results in a simpler manner. For example, under Results and DIscussion section, instead of presenting the detailed cross-validation table for various models, maybe you can try to summarize the findings in sentences, pointing out the key takeaways, and presenting only the final score for the best one.

  2. In the git repository, I noticed that the scripts are currenyly put under the src directory. To further enhance the project structure, maybe you can try to put all scripts into one separate script folder instead.

  3. All the models, figures amnd tables are put into one result folder which is very nice. However, it may be better to create separate 'tables', 'figures', and 'models' under result folder to enhance project structure.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

sandygross commented 7 months ago

Reviewer: sandy02

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 1.5 hours

Review Comments:

Overall, a great work!

  1. The report is well-organized, offering context and relevance to the topic without an excessive use of images. Specifically regarding Figure 3, it might be worth considering to leave it out completely and rather summarize the findings from hyperparameter optimization in a few sentences (maybe ask yourself: Is the table truly of great relevance to the reader, or is only the best result from hyperparameter optimization relevant?)

  2. It would be beneficial to include a 'Community' section in the README, outlining how external contributors can participate in the project. This section could provide guidelines for contributing, including procedures to follow if issues or errors are identified in the current analysis.

  3. Concerning the repository's structure, there are currently 16 branches. I recommend removing any unused branches to improve the overall organization.

  4. Currently, your scripts and functions are both in the 'src' folders which is totally fine. But in my opinion, it would enhance the structure of the report if you place the scripts in a separate 'scripts' folder.

  5. All script outputs are displayed in the results folder. You could enhance organization within the results folder by creating subfolders such as 'models' and 'figures' for further differentiation.

  6. I personally find it preferable (though I acknowledge that Tiff might not include it in her repository either) to have the link to the rendered HTML also included in the 'About' section. This way, you don't have to search for it in the README first, but, is probably just my personal preference.

But again, these are minor issues, great job, guys!

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

karan-khubdikar commented 7 months ago

Data analysis review checklist

Reviewer: @karan-khubdikar

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: ~1.5-2 hours

Review Comments:

  1. Reproducibility - I was able to recreate the analysis using the docker container as well as locally using the environment. The scripts ran without error and the report was built without any issue. So, great job with the docker and environment! Scope for improvement - A few more details could be added for the docker, for instance, after launching the Jupyter notebook inside the container, you could mention that we need to navigate to the root directory before running the run.sh file. Also, for running analysis locally, instead of "install local dependencies", you could mention the command to create the environment which would be more helpful in installing the local dependencies for someone who is not well-versed in working with environments.
  2. Content-wise, the report looks pretty good in terms of representation. It was good that you tried out several models and represented the results from those models. Just curious about why the Decision Tree model was not being used given that the validation results seemed to be better, although it was overfitting. Did you try experimenting with the hyperparameters to check and see if they give better results than your SVM model?
  3. Organization - overall, most of the things in the repo are well-organized however, there could be a scripts folder for the scripts instead of having them in the src folder. Also, the results folder could be organized to have plots, models, tables, etc folders for better organization. Not that this affects the results, but just helps be more organized in my personal opinion.
  4. Community guidelines could also be added to the readme section for the people who want to contribute or report issues if any.
  5. Overall, the report looks great and has a practical application which can be used by the managers to make decision for transfers of the players.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.