elixir-crawly / crawly

Crawly, a high-level web crawling & scraping framework for Elixir.
https://hexdocs.pm/crawly
Apache License 2.0
971 stars 114 forks source link

Pick the user agent from the request. #290

Closed adonig closed 5 months ago

adonig commented 6 months ago

Previously, the user agent passed to Gollum was hardcoded as "Scrapy." This pull request updates the code to dynamically retrieve the user agent from the request. With this enhancement, placing Crawly.Middleware.UserAgent before Crawly.Middleware.RobotsTxt allows for checking whether the user agent is permitted to crawl the page.

adonig commented 6 months ago

Argh, I messed up. I forgot that all commits to my forks master branch end up in this PR. Maybe first consider my Gollum PR and after that I'll sort this one out.