alecxe / scrapy-fake-useragent

Random User-Agent middleware based on fake-useragent
MIT License
687 stars 98 forks source link

http://useragentstring.com/ is DOWN #7

Closed carvajalluis closed 7 years ago

carvajalluis commented 7 years ago

this implementation is completely dependant on the fact that this site es working properly. it should be at least a redundant data source in case site falls the middleware keeps working.

In my particular case I have a task running spiders automatically with scrapyd on a daily basis days like today it won't work

alecxe commented 7 years ago

@luigilabel that's actually a good idea. How about, if we cannot get a random User-Agent from via fake-useragent, we'll fall back to a configured default? What do you think?

Another idea is to add a custom user agent string generator that would be activated in case fake-useragent is not able to provide one.

Thanks!

hellysmile commented 7 years ago

https://github.com/hellysmile/fake-useragent/pull/33

fixes, new release in 10 min

alecxe commented 7 years ago

@hellysmile thanks!

I'd still have an optional default, will keep this issue open.

carvajalluis commented 7 years ago

thanks @hellysmile; @alecxe yeap a default sounds great, but that should be the last resource as it will expose someone's presence in the crawled site's server by unbalanced constant useragent doesn't it ?

alecxe commented 7 years ago

@luigilabel yeah, if there is no default configured, we can leave it as is..or, we can think of generating user-agent strings ourselves without relying on what fake-useragent might provide.

Slater-Victoroff commented 7 years ago

It's down again... SSL cert is expired.

carvajalluis commented 7 years ago

This awesome tool had to be removed from the project at last minute because of this recurrent failures.

alecxe commented 7 years ago

@Slater-Victoroff, @luigilabel yeah, there is not much I can do here. This middleware is a very thin wrapper around fake-useragent which depends on certain resources to be up and running. We can think of ways to fall back to "offline" user-agent randomization, but I am not sure if this should be a part of this project.

Ideas are welcome. Thanks!

hellysmile commented 7 years ago

@Slater-Victoroff which exactly ssl is expired?

hellysmile commented 7 years ago

I can release right now

context=ssl._create_unverified_context()

but, I can not reproduce the issue with expired certificates

twigs67 commented 7 years ago

Has anyone figured a consistent workaround? I'm getting expired SSL certs. UPDATE: I just figured out it has to do with a known issue with MacOS. Please disregard.

alecxe commented 7 years ago

Fixed by #14 - there is a fallback ua agent string support.