JustAnotherArchivist / snscrape

A social networking service scraper in Python
GNU General Public License v3.0
4.39k stars 702 forks source link

Twitter scraping crashes with connection timeout #197

Closed gawainhuang closed 3 years ago

gawainhuang commented 3 years ago

I've just tested the basic CLI way, using

snscrape --jsonl --progress --max-results 500 --since 2020-06-01 twitter-search "its the elephant until:2020-07-31" > text-query-tweets.json

Unfortunately, it didn't work. The detailed report as fellow.

ERROR  snscrape.base  Error retrieving https://twitter.com/search?f=live&lang=en&q=from%3Ajack&src=spelling_expansion_revert_click: ConnectTimeout(MaxRetryError("HTTPSConnectionPool(host='twitter.com', port=443): Max retries exceeded with url: /search?f=live&lang=en&q=from%3Ajack&src=spelling_expansion_revert_click (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x0000017C305C8C10>, 'Connection to twitter.com timed out. (connect timeout=10)'))"))
CRITICAL  snscrape.base  4 requests to https://twitter.com/search?f=live&lang=en&q=from%3Ajack&src=spelling_expansion_revert_click failed, giving up.
CRITICAL  snscrape._cli  Dumped stack and locals to C:\Users\gawai\AppData\Local\Temp\snscrape_locals_dsg6mu32
Traceback (most recent call last):

By the way, I'm using Windows10, and I've tried all kinds of terminals include cmd, powershell, and gitbash. Proxy I used is v2ray, the browser can access twitter. And the cmdlet curl also has the right response. However, ping doesn't work. Is their any information I can provide?

JustAnotherArchivist commented 3 years ago

(Ignoring that the crash message doesn't match your command since that doesn't matter here.)

Interesting, haven't seen that before. Normally, Twitter returns some sort of error rather than blocking connections altogether. So I'd guess that something with your network or proxy config doesn't play well with snscrape. No idea what that could be though.

Does it work with any other scraper than Twitter?