MatteoManzari / Sofascore_scraper

An easy way to collect football players statistics.
3 stars 1 forks source link

getting only repeated 2018 matches in premier league program #1

Open ferbel333 opened 5 years ago

ferbel333 commented 5 years ago

Good evening, Thank you for your contribution. When trying to run your premier league program I am getting repeated 2018 matches (in hrefs.csv file) instead of matches corresponding to previous years.

Best regards

MatteoManzari commented 5 years ago

Ok... Well... I' ve checked the script and there were some mistakes, but there is a big (at least for me) problem. The match (for example Arsenal-Chelsea) has always the same Url (also in different years!).

ferbel333 commented 5 years ago

Thank you for answering.

Yes indeed. I have already managed getting all links using selenium but I have just realized also what you are explaining: identical url for a pair of 2 teams (even home and away matches with identical url). Only the link is pointing to the correct page as the link is using also a data-id which seems unique.

I managed to generate a pandas dataframe and export to .csv (containing both url and data-id)

but I do not know how to use this information to lunch the proper pages in your second program (gamescrape.py) which seems working perfect.

MatteoManzari commented 5 years ago

Where can I find this "data-id" in the html?

ferbel333 commented 5 years ago

When you inspect every match under week list, using your browser inspector, you can see this information:

<a class="cell cell--event-list pointer list-event js-event js-link js-event-link js-event-7769665 js-event-status-finished diminished js-event-home-team-id-74 js-event-away-team-id-45 loaded" href="/es/swansea-city-southampton/Vzb" data-id="7769665" data-start-timestamp="1525805100">

Cheers

MatteoManzari commented 5 years ago

Hi ferbel, did you find a way?

ferbel333 commented 5 years ago

Hi Matteo, No luck so far.

As your second program (gamescraper.py) is based on reading the url list from hrefs.csv file I was wandering if somehow providing both url and data-id might be posible opening the proper web pages. I mean reproducing the information that is included in the url link (href + data-id), which is working fine directly from link, but unfortunately this is currently beyond my knowledge (for example data-id seems to be used when opening web pages from Android).

Hopefully somebody else can help us.

We keep in touch!

Cheers

Enviado desde Correohttps://go.microsoft.com/fwlink/?LinkId=550986 para Windows 10


De: MatteoManzari notifications@github.com Enviado: Tuesday, August 7, 2018 10:53:38 AM Para: MatteoManzari/Sofascore_scraper Cc: ferbel333; Author Asunto: Re: [MatteoManzari/Sofascore_scraper] getting only repeated 2018 matches in premier league program (#1)

Hi ferbel, did you find a way?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/MatteoManzari/Sofascore_scraper/issues/1#issuecomment-410984413, or mute the threadhttps://github.com/notifications/unsubscribe-auth/Am4c679JAvjAruR5APFDQXrsuVoUCyMWks5uOVWSgaJpZM4VraVu.