bisguzar / twitter-scraper

Scrape the Twitter Frontend API without authentication.
MIT License
3.91k stars 599 forks source link

fixed pagination bug not extracting tweets after first page #150

Closed xeliot closed 4 years ago

xeliot commented 4 years ago

Fixes issue (#101)

Brief summary of how I fixed it by modifying tweets.py:

Change r = session.get(url, headers=headers) to r = session.get(url+'&max_position', headers=headers). This allows response json to return min_position parameter which will then be used as the max_position parameter in the next session.get

r_json = r.json()

Change r = session.get(url, params={'max_position': last_tweet}, headers=headers) to r = session.get(url, params={'max_position': r_json['min_position']}, headers=headers).

This gets rid of twitter pages repeating on search query.

bisguzar commented 4 years ago

Thanks for your contribution @xeliot . We have an issue which pointing this problem (#101 ). But its appearing if you try getting tweets from hashtaghs. Is it same thing? Could you explain your problem? Its better if you fill the description :/

PS: I saw your comment on issue now. Just bad space to write description, lol. I saw your PR before your comment which one writed on the issue. my bad...

iamMehedi commented 4 years ago

Any updates on merging this?

bisguzar commented 4 years ago

Thanks for update @iamMehedi . Just merging this awesome contribution :). I'll push to PyPI as soon as possible.