JustAnotherArchivist / snscrape

A social networking service scraper in Python
GNU General Public License v3.0
4.31k stars 698 forks source link

Scraping certain Twitter users produces no or too few results #4

Closed JustAnotherArchivist closed 3 years ago

JustAnotherArchivist commented 5 years ago

snscrape is unable to list the tweets at all or before some point in time for some users. This seems to be a bug in Twitter's search engine.

In some cases, this appears to be temporary. For example, when I tried to scrape https://twitter.com/NewsweekUK on 2018-02-08, it did not produce any results. I confirmed in the browser back then that the search for from:NewsweekUK (which is what snscrape uses) indeed turned up empty. For this particular user, the issue seems to have been fixed.

Other examples:

I believe that this issue is unfixable from snscrape's side. As a workaround, however, snscrape could fall back to scraping the user profile page if it finds no results on the search. This would only yield the 3200 most recent tweets, but that's still better than zero.

ivan commented 5 years ago

Another one that produces no results:

https://twitter.com/raffiahmadlagi https://twitter.com/search?q=from%3Araffiahmadlagi&src=typd which is either a bug or a search/thread/QFD ban.

fin-atem commented 5 years ago

https://twitter.com/prdtrt_shop produces nothing as of 2018/10/01.

if a "fallback to user profile" option was ever to be implemented, it would be nice to see a message along the lines of Twitter search error, falling back to user profile.

ivan commented 5 years ago

https://twitter.com/tomfriedman strangely produces just 6 tweets from snscrape

JustAnotherArchivist commented 5 years ago

@ivan: raffiahmadlagi confirmed.

@fin-atem: prdtrt_shop works fine for me just now and discovered 292 tweets: https://transfer.sh/eriJc/twitter-@prdtrt_shop

@ivan: Can't reproduce; I get 701 results: https://transfer.sh/InZSI/twitter-@tomfriedman

ivan commented 5 years ago

Very strange, I have tried it from several IP addresses and I get either 0 results (rarely) or

# ~/sns-venv/bin/snscrape twitter-user tomfriedman
https://twitter.com/tomfriedman/status/1049354783730163713
https://twitter.com/tomfriedman/status/1048924238466437127
https://twitter.com/tomfriedman/status/1048197032618414080
https://twitter.com/tomfriedman/status/1048196285390573569
https://twitter.com/tomfriedman/status/1048195792220164096
https://twitter.com/tomfriedman/status/1047261789224931329
ivan commented 5 years ago

snscrape is working properly and getting all the tweets on my Linux machines after replacing the User-Agent:

headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.45 Safari/537.36'}
JustAnotherArchivist commented 5 years ago

https://twitter.com/ZerinaX cuts off on 2019-02-12.

JustAnotherArchivist commented 4 years ago

https://twitter.com/AshkanMonfared_ produces no results in the search. This might be related to the account getting banned temporarily.

Edit 2020-01-14 17:47 UTC: The search is now returning results again. It was not doing that when I last checked yesterday evening (roughly midnight UTC). So apparently the search took ~3 days to reindex the account's tweets after the ban got lifted, assuming this is indeed the reason why they disappeared.

JustAnotherArchivist commented 3 years ago

Closing this since it's impossible to fix in snscrape.