edsu / etudier

Extract a citation network from Google Scholar
161 stars 27 forks source link

Write partial results when hitting a Captcha #7

Closed volkerkarle closed 1 year ago

volkerkarle commented 6 years ago

Hi there, I am receiving so many "I am not a robot" captures that it is impossible for me to end up with any result. I guess there is not way to circumvent this problem?

wolfiex commented 4 years ago

No there wouldn't be since it is google which block the ip address temporarily. The only solution I found was to use a VPN when this happens.

Code improvement suggestion: There needs to be a redundancy which saves all scraped data at the point of killing the browser (exception handling)

6884 commented 3 years ago

THIS. I too was too greedy and called a too big query, and that's what happened. It would REALLY be a great feature to have a temporary dump of whatever had been obtained so far at any point there is a captcha time alert.

edsu commented 3 years ago

I like the idea of writing partial results so all is not lost when you get blocked.

edsu commented 1 year ago

This is implemented in v0.2.0 that is now available.