Ironhack-data-bcn-oct-2023 / project-II-pipelines

0 stars 2 forks source link

[Marc Folch] project-II #7

Open marcfolchp opened 11 months ago

marcfolchp commented 11 months ago

https://github.com/marcfolchp/project2

sh-ih commented 11 months ago

You did great on this project! Some comments:

The data folder is commonly used to store the datasets: the original dataset, if you are including it, and the cleaned dataset. Your notebooks should be concentrated in a notebooks folder. And the src folder is used to store the .py files with your functions.

A way to improve your repository would be to use clearer name files on your images folder, for example, the titles of your graphs (“lowest_average_goals_per_match.png”, “shots_distribution.png”) so your reader knows what each files contains without having to open each file.

To make the maintenance of your code easier, I would suggest you create a function to scrape each website, instead of one function to scrape all of the websites. That way, if one of the websites changes structure in the future, you can immediately detect which function needs updating.

You can then create one function to call all the scrappers and create your final dataframe.

A recommendation of best code practices is that a function should complete only one task, not multiple tasks. We could define “one task” as “create a dataframe for Premier League”, “create a dataframe for la liga” and so on, instead of using “scrape all pages” as one task.

The comment you have in your conclusion section: “.. where has La Liga gone after Messi and Cristiano left” could be a great way to start your whole storytelling. It is an interesting point that could capture the attention of your reader.

image