Submission: Group 17: Fifa-potential

Submitting authors: @srfrew @meretelutz @jbarns14 @WaleedMahmood1

Repository: https://github.com/UBC-MDS/fifa-potential Report link: https://ubc-mds.github.io/fifa-potential/high-potential-fifa-prediction-report.html Abstract/executive summary: We attempt to construct a classification model using an RBF SVM classifier algorithm which uses FIFA22 player attribute ratings to classify players’ potential with target classes “Low”, “Medium”, “Good”, and “Great”. The classes are split on the quartiles of the distribution of the FIFA22 potential ratings. Our model performed reasonably well on the test data with an accuracy score of 0.809, with hyperparamters C: 100.0 & Gamma: 0.010. However, we believe there is still significant room for improvement before the model is ready to be utilized by soccer clubs and coaching staffs to predict the potential of players on the field instead of on the screen.

Editor: @ttimbers Reviewer: Karan Khubdikar, Sandra Gross, Nicole Tu, and Jordan Cairns

[x] I agree to abide by MDS's Code of Conduct during the review process and in maintaining my package should it be accepted.

Reviewer: Nicole-Tu97

Conflict of interest

[Y] As the reviewer I confirm that I have no conflicts of interest for me to review this work.

Code of Conduct

[Y] I confirm that I read and will adhere to the MDS code of conduct.

General checks

[Y] Repository: Is the source code for this data analysis available? Is the repository well organized and easy to navigate?
[Y] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?

Documentation

[Y] Installation instructions: Is there a clearly stated list of dependencies?
[Y] Example usage: Do the authors include examples of how to use the software to reproduce the data analysis?
[Y] Functionality documentation: Is the core functionality of the data analysis software documented to a satisfactory level?
[N] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Code quality

[Y] Readability: Are scripts, functions, objects, etc., well named? Is it relatively easy to understand the code?
[Y] Style guidelides: Does the code adhere to well known language style guides?
[Y] Modularity: Is the code suitably abstracted into scripts and functions?
[Y] Tests: Are there automated tests or manual steps described so that the function of the software can be verified? Are they of sufficient quality to ensure software robsutness?

Reproducibility

[Y ] Data: Is the raw data archived somewhere? Is it accessible?
[Y] Computational methods: Is all the source code required for the data analysis available?
[Y] Conditions: Is there a record of the necessary conditions (software dependencies) needed to reproduce the analysis? Does there exist an easy way to obtain the computational environment needed to reproduce the analysis?
[Y] Automation: Can someone other than the authors easily reproduce the entire data analysis?

Analysis report

[Y] Authors: Does the report include a list of authors with their affiliations?
[Y] What is the question: Do the authors clearly state the research question being asked?
[Y] Importance: Do the authors clearly state the importance for this research question?
[Y] Background: Do the authors provide sufficient background information so that readers can understand the report?
[Y] Methods: Do the authors clearly describe and justify the methodology used in the data analysis? Do the authors communicate any assumptions or limitations of their methodologies?
[Y] Results: Do the authors clearly communicate their findings through writing, tables and figures?
[Y] Conclusions: Are the conclusions presented by the authors correct?
[Y] References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
[Y] Writing quality: Is the writing of good quality, concise, engaging?

Estimated hours spent reviewing: 1.5 hours

Review Comments:

Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.

The report is very well-structured and explains the model in detail. However, the report can be more comprehensible to people with non-technical background if you can use less technical terminologies and explain the results in a simpler manner. For example, under Results and DIscussion section, instead of presenting the detailed cross-validation table for various models, maybe you can try to summarize the findings in sentences, pointing out the key takeaways, and presenting only the final score for the best one.
In the git repository, I noticed that the scripts are currenyly put under the src directory. To further enhance the project structure, maybe you can try to put all scripts into one separate script folder instead.
All the models, figures amnd tables are put into one result folder which is very nice. However, it may be better to create separate 'tables', 'figures', and 'models' under result folder to enhance project structure.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

Reviewer: sandy02

Conflict of interest

[✓] As the reviewer I confirm that I have no conflicts of interest for me to review this work.

Code of Conduct

[✓] I confirm that I read and will adhere to the MDS code of conduct.

General checks

[✓] Repository: Is the source code for this data analysis available? Is the repository well organized and easy to navigate?
[✓] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?

Documentation

[✓] Installation instructions: Is there a clearly stated list of dependencies?
[✓] Example usage: Do the authors include examples of how to use the software to reproduce the data analysis?
[✓] Functionality documentation: Is the core functionality of the data analysis software documented to a satisfactory level?
[ N] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Code quality

[✓] Readability: Are scripts, functions, objects, etc., well named? Is it relatively easy to understand the code?
[✓] Style guidelides: Does the code adhere to well known language style guides?
[✓] Modularity: Is the code suitably abstracted into scripts and functions?
[✓] Tests: Are there automated tests or manual steps described so that the function of the software can be verified? Are they of sufficient quality to ensure software robsutness?

Reproducibility

[✓] Data: Is the raw data archived somewhere? Is it accessible?
[✓] Computational methods: Is all the source code required for the data analysis available?
[✓] Conditions: Is there a record of the necessary conditions (software dependencies) needed to reproduce the analysis? Does there exist an easy way to obtain the computational environment needed to reproduce the analysis?
[✓] Automation: Can someone other than the authors easily reproduce the entire data analysis?

Analysis report

[✓] Authors: Does the report include a list of authors with their affiliations?
[✓] What is the question: Do the authors clearly state the research question being asked?
[✓] Importance: Do the authors clearly state the importance for this research question?
[✓] Background: Do the authors provide sufficient background information so that readers can understand the report?
[✓] Methods: Do the authors clearly describe and justify the methodology used in the data analysis? Do the authors communicate any assumptions or limitations of their methodologies?
[✓] Results: Do the authors clearly communicate their findings through writing, tables and figures?
[✓] Conclusions: Are the conclusions presented by the authors correct?
[✓] References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
[✓] Writing quality: Is the writing of good quality, concise, engaging?

Estimated hours spent reviewing: 1.5 hours

Review Comments:

Overall, a great work!

The report is well-organized, offering context and relevance to the topic without an excessive use of images. Specifically regarding Figure 3, it might be worth considering to leave it out completely and rather summarize the findings from hyperparameter optimization in a few sentences (maybe ask yourself: Is the table truly of great relevance to the reader, or is only the best result from hyperparameter optimization relevant?)
It would be beneficial to include a 'Community' section in the README, outlining how external contributors can participate in the project. This section could provide guidelines for contributing, including procedures to follow if issues or errors are identified in the current analysis.
Concerning the repository's structure, there are currently 16 branches. I recommend removing any unused branches to improve the overall organization.
Currently, your scripts and functions are both in the 'src' folders which is totally fine. But in my opinion, it would enhance the structure of the report if you place the scripts in a separate 'scripts' folder.
All script outputs are displayed in the results folder. You could enhance organization within the results folder by creating subfolders such as 'models' and 'figures' for further differentiation.
I personally find it preferable (though I acknowledge that Tiff might not include it in her repository either) to have the link to the rendered HTML also included in the 'About' section. This way, you don't have to search for it in the README first, but, is probably just my personal preference.

But again, these are minor issues, great job, guys!

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

Data analysis review checklist

Reviewer: @karan-khubdikar

Conflict of interest

[x] As the reviewer I confirm that I have no conflicts of interest for me to review this work.

Code of Conduct

[x] I confirm that I read and will adhere to the MDS code of conduct.

General checks

[x] Repository: Is the source code for this data analysis available? Is the repository well organized and easy to navigate?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?

Documentation

[x] Installation instructions: Is there a clearly stated list of dependencies?
[x] Example usage: Do the authors include examples of how to use the software to reproduce the data analysis?
[x] Functionality documentation: Is the core functionality of the data analysis software documented to a satisfactory level?
[ ] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Code quality

[x] Readability: Are scripts, functions, objects, etc., well named? Is it relatively easy to understand the code?
[x] Style guidelides: Does the code adhere to well known language style guides?
[x] Modularity: Is the code suitably abstracted into scripts and functions?
[x] Tests: Are there automated tests or manual steps described so that the function of the software can be verified? Are they of sufficient quality to ensure software robustness?

Reproducibility

[x] Data: Is the raw data archived somewhere? Is it accessible?
[x] Computational methods: Is all the source code required for the data analysis available?
[x] Conditions: Is there a record of the necessary conditions (software dependencies) needed to reproduce the analysis? Does there exist an easy way to obtain the computational environment needed to reproduce the analysis?
[x] Automation: Can someone other than the authors easily reproduce the entire data analysis?

Analysis report

[x] Authors: Does the report include a list of authors with their affiliations?
[x] What is the question: Do the authors clearly state the research question being asked?
[x] Importance: Do the authors clearly state the importance for this research question?
[x] Background: Do the authors provide sufficient background information so that readers can understand the report?
[x] Methods: Do the authors clearly describe and justify the methodology used in the data analysis? Do the authors communicate any assumptions or limitations of their methodologies?
[x] Results: Do the authors clearly communicate their findings through writing, tables and figures?
[x] Conclusions: Are the conclusions presented by the authors correct?
[x] References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
[x] Writing quality: Is the writing of good quality, concise, and engaging?

Estimated hours spent reviewing: ~1.5-2 hours

Review Comments:

Reproducibility - I was able to recreate the analysis using the docker container as well as locally using the environment. The scripts ran without error and the report was built without any issue. So, great job with the docker and environment! Scope for improvement - A few more details could be added for the docker, for instance, after launching the Jupyter notebook inside the container, you could mention that we need to navigate to the root directory before running the run.sh file. Also, for running analysis locally, instead of "install local dependencies", you could mention the command to create the environment which would be more helpful in installing the local dependencies for someone who is not well-versed in working with environments.
Content-wise, the report looks pretty good in terms of representation. It was good that you tried out several models and represented the results from those models. Just curious about why the Decision Tree model was not being used given that the validation results seemed to be better, although it was overfitting. Did you try experimenting with the hyperparameters to check and see if they give better results than your SVM model?
Organization - overall, most of the things in the repo are well-organized however, there could be a scripts folder for the scripts instead of having them in the src folder. Also, the results folder could be organized to have plots, models, tables, etc folders for better organization. Not that this affects the results, but just helps be more organized in my personal opinion.
Community guidelines could also be added to the readme section for the people who want to contribute or report issues if any.
Overall, the report looks great and has a practical application which can be used by the managers to make decision for transfers of the players.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

UBC-MDS / data-analysis-review-2023