5hirish / tweet_scrapper

Scrape the Twitter frontend API without any authentication and restriction.
http://www.shirishkadam.com/
GNU General Public License v3.0
58 stars 12 forks source link

Scrape Profile Image URL #15

Open farisalasmary opened 5 years ago

farisalasmary commented 5 years ago

I've been using this library for a while but unfortunately I did not find profile image URL within the scraped data. I've struggled to modify the code but with no result! My real problem is with twitter's class name obfuscation. For example, class="css-1dbjc4n r-1j3t67a" is the CSS class used inside the div of each tweet but in your code it is as simple as https://github.com/5hirish/tweet_scrapper/blob/4337e09aae8d82cdd0f63d5ec9978e0aa0a1a571/tweetscrape/tweets_scrape.py#L32. How could you know the real name of the class? also, how can you add a new feature like profile image URL?

5hirish commented 5 years ago

@farisalasmary this library uses XPATH to scrape data. So to get the profile picture image one could use: //*[@id="page-container"]/div[1]/div/div[1]/div[2]/div[1]/div/a/img XPATH query. You can even further simplify this XPATH query. If you do add this please raise a PR and I will merge it.