Closed carvajalluis closed 7 years ago
@luigilabel that's actually a good idea. How about, if we cannot get a random User-Agent from via fake-useragent
, we'll fall back to a configured default? What do you think?
Another idea is to add a custom user agent string generator that would be activated in case fake-useragent
is not able to provide one.
Thanks!
https://github.com/hellysmile/fake-useragent/pull/33
fixes, new release in 10 min
@hellysmile thanks!
I'd still have an optional default, will keep this issue open.
thanks @hellysmile; @alecxe yeap a default sounds great, but that should be the last resource as it will expose someone's presence in the crawled site's server by unbalanced constant useragent doesn't it ?
@luigilabel yeah, if there is no default configured, we can leave it as is..or, we can think of generating user-agent strings ourselves without relying on what fake-useragent
might provide.
It's down again... SSL cert is expired.
This awesome tool had to be removed from the project at last minute because of this recurrent failures.
@Slater-Victoroff, @luigilabel yeah, there is not much I can do here. This middleware is a very thin wrapper around fake-useragent
which depends on certain resources to be up and running. We can think of ways to fall back to "offline" user-agent randomization, but I am not sure if this should be a part of this project.
Ideas are welcome. Thanks!
@Slater-Victoroff which exactly ssl is expired?
I can release right now
context=ssl._create_unverified_context()
but, I can not reproduce the issue with expired certificates
Has anyone figured a consistent workaround? I'm getting expired SSL certs. UPDATE: I just figured out it has to do with a known issue with MacOS. Please disregard.
Fixed by #14 - there is a fallback
ua agent string support.
this implementation is completely dependant on the fact that this site es working properly. it should be at least a redundant data source in case site falls the middleware keeps working.
In my particular case I have a task running spiders automatically with scrapyd on a daily basis days like today it won't work