apify / got-scraping

HTTP client made for scraping based on got.
422 stars 32 forks source link

Invalid charset error in url #105

Closed teammakdi closed 9 months ago

teammakdi commented 9 months ago

Urls with invalid charset such as

https://datadeliver.net/ads.txt has set uft-8 instead of utf-8 results in error

ERROR HttpCrawler: Request failed and reached maximum retries. Error: Resource https://datadeliver.net/ads.txt served with unsupported charset/encoding: uft-8
    at HttpCrawler._encodeResponse (/node_modules/@crawlee/http/internals/http-crawler.js:544:15)
    at HttpCrawler._parseResponse (/node_modules/@crawlee/http/internals/http-crawler.js:442:45)
    at HttpCrawler._runRequestHandler (/node_modules/@crawlee/http/internals/http-crawler.js:308:39)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async wrap (/node_modules/@apify/timeout/index.js:52:21) {"id":"uIK9UTUGCjKrRhO","url":"https://datadeliver.net/ads.txt","method":"GET","uniqueKey":"https://datadeliver.net/ads.txt"}

It works fine on curl and browser

It is expected to have such error or we can skip such checks?

teammakdi commented 9 months ago

Works with raw got


import got from 'got';

async function main() {
    try {
        const response = await got('https://datadeliver.net/ads.txt');
        console.log(response.body);
    } catch (error) {
        console.log(error);
    }
}

main()
teammakdi commented 9 months ago

@B4nan can we release this?

vladfrangu commented 9 months ago

This is currently in testing, as we need to also integrate it in crawlee before you can use it! If you're manually using it in your project, you'll need to jump to ESM (and import the module via import { gotScraping } from 'got-scraping' / import())