Closed adonig closed 3 months ago
Maybe there's a way to squash all those commits into one 😅
Do you believe this test is sufficient?
test "Respects the User-Agent header when evaluating robots.txt" do
:meck.expect(Gollum, :crawlable?, fn
"My Custom Bot", _url -> :crawlable
_ua, _url -> :uncrawlable
end)
middlewares = [
{Crawly.Middlewares.UserAgent, user_agents: ["My Custom Bot"]},
Crawly.Middlewares.RobotsTxt
]
req = @valid
state = %{spider_name: :test_spider, crawl_id: "123"}
assert {%Crawly.Request{}, _state} =
Crawly.Utils.pipe(middlewares, req, state)
middlewares = [Crawly.Middlewares.RobotsTxt]
assert {false, _state} = Crawly.Utils.pipe(middlewares, req, state)
end
This pull request updates the RobotsTxt middleware to dynamically use the User-Agent from each request instead of relying on a hardcoded value. It supersedes an earlier attempt, ensuring that the changes merge cleanly without the previous issues.