aaronsw / html2text

Convert HTML to Markdown-formatted text.
http://www.aaronsw.com/2002/html2text/
GNU General Public License v3.0
2.63k stars 414 forks source link

Please enable cookies #102

Closed m0o0scar closed 6 years ago

m0o0scar commented 8 years ago

When I html2text https://davidwalsh.name/2016s-most-important-web-apps-tools, the following error shows up:

Please enable cookies.

Error 1010 Ray ID: 272e8783e7f122e2 • 2016-02-11 08:02:03 UTC

Access denied

What happened?

The owner of this website (davidwalsh.name) has banned your access based on your browser's signature (272e8783e7f122e2-ua48).

CloudFlare Ray ID: 272e8783e7f122e2 • Your IP: xxxx • Performance & security by CloudFlare

mcepl commented 8 years ago

Did you go to that page with a regular browser ... my Firefox gives me on that URL "Page not found" error? Besides, direct downloading of HTML from the web is kind of hack in html2text it certainly doesn't contain the same functionality as a full browser. It is usually better to download HTML pages to the disk and process them from there.

Alir3z4 commented 8 years ago

@moscartong html2text feature for reading from URL is just a tiny helper and should not be considered a reliable method for such situations.

as @mcepl said:

It is usually better to download HTML pages to the disk and process them from there.