dcaribou / transfermarkt-scraper

🕸️ Collects data from Transfermarkt website
95 stars 36 forks source link

Scraping Appearances Using Links to Individual Teams #84

Closed michmosc closed 7 months ago

michmosc commented 8 months ago

Hello David,

I am following up regarding the possibility of altering the appearances.py code to take in readily available links to teams from other leagues other than the ones that exist on your Kaggle page (those are great, by the way, but I would love to expand the dataset to appearances of players from certain other leagues).

Below is the example of Transfermarkt links to different teams in the Algerian Ligue Professionnelle 1.

alg1_teams.csv

Is there a possibility to alter the code in a way that it takes in the links and then provides players and, subsequently, their appearances? In the past, I have tried myself but was not successful -- any alterations to the code lead me to an empty .json file without any appearances scraped.

Any and all help will be much appreciated! Thank you for your time and for building out this scraper.

Best,

michmosc

dcaribou commented 8 months ago

Hi @michmosc, thanks for posting your question.

If I understand correctly what you are trying to do, you don't need to modify the scraper code to achieve this. The scraper is generic, in the sense that each crawler will take in a set of "parent urls" from a file and loop through all of them and provide the scraped data as a result.

For example, if you want to extract player data, and your have a set of club URLs (clubs are player's parents), then you can just pass that file to the players crawler and it should return the scraped data. You do need to

⚠️ Make sure your local setup is fine by running the scrapy check command

scrapy check -s HTTPCACHE_ENABLED=False -s USER_AGENT='your UA goes here'

I tried with one of the URLs in your CSV and I was able to extract some players (note the href attribute in the parent provided to the crawler)

 echo '{"type": "club", "href": "/asm-oran/startseite/verein/30505/saison_id/2001", "parent": {}}' \
    | scrapy crawl players \
    | jq -c '.'

Some sample records returned by this command.

{"type":"player","href":"/amine-el-amali/profil/spieler/236728","parent":{"type":"club","href":"/asm-oran/startseite/verein/30505/saison_id/2001","seasoned_href":"https://www.transfermarkt.co.uk/asm-oran/startseite/verein/30505/saison_id/2001/saison_id/2022"},"name":"Amine","last_name":"El Amali","number":"#10","name_in_home_country":null,"date_of_birth":"Apr 29, 1988","place_of_birth":{"country":"Algeria","city":"Oran  "},"age":"35","height":null,"citizenship":"Algeria","position":"Attack - Right Winger","player_agent":{"href":null,"name":null},"image_url":"https://img.a.transfermarkt.technology/portrait/header/default.jpg?lm=1","current_club":{"href":"/unknown/startseite/verein/75"},"foot":"right","joined":"\n                            -                        ","contract_expires":"-","day_of_last_contract_extension":null,"outfitter":null,"current_market_value":null,"highest_market_value":null,"market_value_history":null,"code":"amine-el-amali"}
{"type":"player","href":"/abdellah-daouadji/profil/spieler/284733","parent":{"type":"club","href":"/asm-oran/startseite/verein/30505/saison_id/2001","seasoned_href":"https://www.transfermarkt.co.uk/asm-oran/startseite/verein/30505/saison_id/2001/saison_id/2022"},"name":"Abdellah","last_name":"Daouadji","number":"#22","name_in_home_country":null,"date_of_birth":"Jul 9, 1995","place_of_birth":{"country":null,"city":null},"age":"28","height":"1,76 m","citizenship":"Algeria","position":"Attack - Right Winger","player_agent":{"href":null,"name":null},"image_url":"https://img.a.transfermarkt.technology/portrait/header/default.jpg?lm=1","current_club":{"href":"/us-biskra/startseite/verein/31757"},"foot":null,"joined":"\n                            Sep 10, 2023                        ","contract_expires":"-","day_of_last_contract_extension":null,"outfitter":null,"current_market_value":null,"highest_market_value":null,"market_value_history":null,"code":"abdellah-daouadji"}
{"type":"player","href":"/ilyes-kourbia/profil/spieler/232945","parent":{"type":"club","href":"/asm-oran/startseite/verein/30505/saison_id/2001","seasoned_href":"https://www.transfermarkt.co.uk/asm-oran/startseite/verein/30505/saison_id/2001/saison_id/2022"},"name":"Ilyes","last_name":"Kourbia","number":null,"name_in_home_country":null,"date_of_birth":"Nov 9, 1992","place_of_birth":{"country":null,"city":null},"age":"31","height":"1,83 m","citizenship":"Algeria","position":"Attack - Left Winger","player_agent":{"href":null,"name":null},"image_url":"https://img.a.transfermarkt.technology/portrait/header/default.jpg?lm=1","current_club":{"href":"/e-sour-el-ghozlane/startseite/verein/42084"},"foot":"right","joined":"\n                            Sep 13, 2023                        ","contract_expires":"-","day_of_last_contract_extension":null,"outfitter":null,"current_market_value":null,"highest_market_value":null,"market_value_history":null,"code":"ilyes-kourbia"}
michmosc commented 8 months ago

Hi David,

Thank you for a quick response. Will try and post the status here.

-michmosc