Open JaseZiv opened 1 year ago
Transfermarkt league seasons appear inconsistent with the html of those same values.
The get_transfermarkt_metadata.R file here uses the below code to get a list of available seasons for the particular league:
get_transfermarkt_metadata.R
comp_url <- "https://www.transfermarkt.com/campeonato-brasileiro-serie-a/startseite/wettbewerb/BRA1" league_page <- xml2::read_html(comp_url) seasons <- league_page %>% rvest::html_nodes(".chzn-select") %>% rvest::html_nodes("option")
Which returns the following values:
# {xml_nodeset (27)} # [1] <option selected value="2022">2023</option>\n # [2] <option value="2021">2022</option>\n # [3] <option value="2020">2021</option>\n # [4] <option value="2019">2020</option>\n # [5] <option value="2018">2019</option>\n # [6] <option value="2017">2018</option>\n # [7] <option value="2016">2017</option>\n # [8] <option value="2015">2016</option>\n # [9] <option value="2014">2015</option>\n # [10] <option value="2013">2014</option>\n # [11] <option value="2012">2013</option>\n # [12] <option value="2011">2012</option>\n # [13] <option value="2010">2011</option>\n # [14] <option value="2009">2010</option>\n # [15] <option value="2008">2009</option>\n # [16] <option value="2007">2008</option>\n # [17] <option value="2006">2007</option>\n # [18] <option value="2005">2006</option>\n # [19] <option value="2004">2005</option>\n # [20] <option value="2003">2004</option>\n # ...
To get the values we need, we use the below:
season_start_year <- c() for(each_season in seasons) { season_start_year <- c(season_start_year, xml2::xml_attrs(each_season)[["value"]]) }
Which gives us:
[1] "2022" "2021" "2020" "2019" "2018" "2017" [7] "2016" "2015" "2014" "2013" "2012" "2011" [13] "2010" "2009" "2008" "2007" "2006" "2005" [19] "2004" "2003" "2002" "2001" "2000" "1998" [25] "1997" "1996" "1995"
This is fine, however for the current the Brasileiro Séria A season (2023), the season URL uses 2022.
Users will need to be aware of this until we find a work around that works for both 'correct' and 'incorrect' seasons...
Transfermarkt league seasons appear inconsistent with the html of those same values.
The
get_transfermarkt_metadata.R
file here uses the below code to get a list of available seasons for the particular league:Which returns the following values:
To get the values we need, we use the below:
Which gives us:
This is fine, however for the current the Brasileiro Séria A season (2023), the season URL uses 2022.
Users will need to be aware of this until we find a work around that works for both 'correct' and 'incorrect' seasons...