Closed mvolfik closed 10 months ago
Discussed this with @mvolfik , here is his message:
also, what currently happens when we get a 403 response with disallowed content-type? for example, if some server was returning all 403 blocked responses as image/jpeg which isn't allowed in the crawler, but if we retry the request with new proxy to get a 200, we would get html as usually? this bug report might actually apply to this scenario as well, not sure
So the purpose I guess is to still block unsupported response types, but try retrying them, because it might be temporary
just ran into this again with Yelp. @foxt451 did you do any work on this, or can I take over?
just ran into this again with Yelp. @foxt451 did you do any work on this, or can I take over?
Hi, nope, if i remember correctly. You can take it
Which package is this bug report for? If unsure which one to select, leave blank
@crawlee/http (HttpCrawler)
Issue description
Run the example code. The endpoint returns 403 with just
Crawlee fills default content-type application/octet-stream, and fails the request on https://github.com/apify/crawlee/blob/d453f9c6d224b67b2b2f99e7d4dc85b3ca71129b/packages/http-crawler/src/internals/http-crawler.ts#L744-L747
Expected behavior: do a standard request retry on 403 response.
Code sample
Package version
3.4.1
Node.js version
18.16.1
Operating system
Linux
Apify platform
I have tested this on the
next
releaseNo response
Other context
No response