Altimis / Scweet

A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers, user info, images...
MIT License
1.07k stars 226 forks source link

Followings and followers. #12

Closed mayankrichu closed 3 years ago

mayankrichu commented 3 years ago

Hi, I've been using twint to fetch followers, followers, favorites. But it's not working there. Are you also working on it?

Altimis commented 3 years ago

Hi @mayankrichu, Yes I'm working on it. So you want followers and following for a specific user ? or for each user for all tweets that you are scraping ?

mayankrichu commented 3 years ago

Thanks for your reply. I just want followers and followings of specific user.

Altimis commented 3 years ago

Hi again @mayankrichu. I updated the code to be able to retrieve followers and following as promised (see notebook example). Unfortunately, you must log in to be able to see such information.

perara commented 3 years ago

I've done some testing, at it seems like you will be rate-limited eventually using this method. To test this you could retrieve followers and try to retrieve followers for that list of followers. I did rewrite some of this to use requests after the initial login. perhaps it works with pure selenium as that will be significantly slower?

Altimis commented 3 years ago

Hi @perara. Indeed, you can't retrieve followers all day with the same account, since Twiiter will detect you as a robot that does not respect robot limitations (timeout...). The only solution that I see for this method is to increase the timeout parameter and, if you want to retrieve followers and following for many users(espacially followers since there could be millions of them for the same user), I recommend to scrape these information for one user at a time, and change the account each time (let's say 3 accounts). I know this method sounds a little bit naive but I couldn't think otherwise (the only other solution is to use the API, like Twitter-Scraper library). however, i didn't quite understand what do you mean by "pure selenium". Can you tell me how many followers could you retrieve before your account was limited ?

perara commented 3 years ago

Basically, you do the initial login with selenium to get the bearer token. From here you can directly query their graphql as your browser does when scolling. Not only does this reduce the footprint of the process, it also speeds up significantly.

I got a draft locally which integrates well with your approach, ill pr, when ive tested multiple user spproach