Closed triay0 closed 1 year ago
@triay0 this website doesn't not use UTF-8 but another charset.
In order to get the correct utf8 characters from such pages, you can fetch the HTML and decode them before passing into article-extractor's extractFromHtml
, as below:
const res = await fetch(url)
const buffer = await res.arrayBuffer()
const decoder = new TextDecoder('iso-8859-1')
const html = decoder.decode(buffer)
const art = await extractFromHtml(html)
console.log(art)
Thanks for this amazing package, would it be possible to get content with utf8, spanish accents are not recognized