[Marc Folch] project-II

You did great on this project! Some comments:

Repository You have a good folder structure, but the contents could be organized in a better way.

The data folder is commonly used to store the datasets: the original dataset, if you are including it, and the cleaned dataset. Your notebooks should be concentrated in a notebooks folder. And the src folder is used to store the .py files with your functions.

A way to improve your repository would be to use clearer name files on your images folder, for example, the titles of your graphs (“lowest_average_goals_per_match.png”, “shots_distribution.png”) so your reader knows what each files contains without having to open each file.

Acquisition and enrichment of database You choose a pretty complicated datasource by scrapping five different websites. Great work on a highly ambitious project.

To make the maintenance of your code easier, I would suggest you create a function to scrape each website, instead of one function to scrape all of the websites. That way, if one of the websites changes structure in the future, you can immediately detect which function needs updating.

You can then create one function to call all the scrappers and create your final dataframe.

A recommendation of best code practices is that a function should complete only one task, not multiple tasks. We could define “one task” as “create a dataframe for Premier League”, “create a dataframe for la liga” and so on, instead of using “scrape all pages” as one task.

Reporting You have your analysis and conclusions mainly on your README, but it is important that you include that too in the notebook that will work as your final report, in this case it could be the visualisation.ipynb or you could even create a notebook just for the final report, where you can develop your storytelling. You have the opportunity to write more in depth analysis and conclusions there and just keep the most attractive points in the README.
README You have a clear and well organized README. It is a great idea to accompany your analysis of the data with visualizations, to make it even more powerful.

The comment you have in your conclusion section: “.. where has La Liga gone after Messi and Cristiano left” could be a great way to start your whole storytelling. It is an interesting point that could capture the attention of your reader.

Ironhack-data-bcn-oct-2023 / project-II-pipelines

[Marc Folch] project-II #7