alecxe / scrapy-fake-useragent

Random User-Agent middleware based on fake-useragent
MIT License
686 stars 98 forks source link

FakerProvider not working ? #30

Closed SecT0uch closed 4 years ago

SecT0uch commented 4 years ago

Thanks a lot for your work! :+1:

Here is my settings.py:

DOWNLOADER_MIDDLEWARES = {
    'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
    'scrapy.downloadermiddlewares.retry.RetryMiddleware': None,
    'scrapy_fake_useragent.middleware.RandomUserAgentMiddleware': 400,
    'scrapy_fake_useragent.middleware.RetryUserAgentMiddleware': 401,
}
FAKEUSERAGENT_PROVIDERS = [
    # 'scrapy_fake_useragent.providers.FakeUserAgentProvider',  # Depends on http://useragentstring.com which is currently down
    'scrapy_fake_useragent.providers.FakerProvider',  # if FakeUserAgentProvider fails, we'll use faker to generate a user-agent string for us
    'scrapy_fake_useragent.providers.FixedUserAgentProvider',  # fall back to USER_AGENT value
]
USER_AGENT = "TEST"

When running scrapy shell in my project, my fallback USER_AGENT is used:

In [1]: settings.get('USER_AGENT')
Out[1]: 'TEST'

Also tried to define FAKER_RANDOM_UA_TYPE = 'user_agent'

Using :

Side note: the requirements on the PyPi archive are missing faker

alecxe commented 4 years ago

Side note: the requirements on the PiPY archive are missing faker

Oh, good to know about faker missing from install_requires - thanks for reporting! Fixed in 1.4.3.

Will check your question shortly, thank you.

alecxe commented 4 years ago

Ah, yeah, so the library does not overwrite settings.USER_AGENT value and instead, sets User-Agent header value of a request instance in a middleware. Hope that clears it up a bit.

SecT0uch commented 4 years ago

Thanks, I get it I was wondering. However, even adding print(response.request.headers['User-Agent']) in my parse() function returns b'TEST'.

Inspired by : https://stackoverflow.com/questions/23152739/how-to-make-scrapy-show-user-agent-per-download-request-in-log

As suggested, I'll setup a local webserver to make sure.

EDIT: Same issue here: From within my project I start scrapy shell and run fetch('http://127.0.0.1:8000'). My webserver inspecting the headers returns User-Agent: TEST

alecxe commented 4 years ago

@SecT0uch oh yeah, thank you for the patience and effort to provide a reproducible example - just reproduced, added a test and fixed. Sorry for this miss - classic mistake 👍

Fixed in 1.4.4. Uploaded to PyPI.

SecT0uch commented 4 years ago

That's fantastic! Thanks, works like a charm :)