eafer / rdrview

Firefox Reader View as a command line tool
Apache License 2.0
836 stars 35 forks source link

Not working with NixCraft #32

Closed simonhughxyz closed 4 months ago

simonhughxyz commented 1 year ago

rdrview does not work with NixCraft articles (https://www.cyberciti.biz/). Produces this output: rdrview: no content could be extracted.

It would be nice to be able to read NixCraft articles on my terminal.

AbeEstrada commented 1 year ago

I'm using a function to download the articles first and then passing them to rdrview, something like this:

function rdr {
  readonly u=${1:?"The url must be specified."}
  curl -A "Mozilla Firefox" -sL "$u" | rdrview -B lynx --disable-sandbox
}
AbeEstrada commented 1 year ago

I also found that cyberciti.biz is using CloudFlare and you need to pass a JavaScript challenge before loading the content:

<!DOCTYPE html>
<html lang="en-US">
<head>
    <title>Just a moment...</title>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=Edge">
    <meta name="robots" content="noindex,nofollow">
    <meta name="viewport" content="width=device-width,initial-scale=1">
    <link href="/cdn-cgi/styles/challenges.css" rel="stylesheet">

</head>
<body class="no-js">
    <div class="main-wrapper" role="main">
    <div class="main-content">
        <noscript>
            <div id="challenge-error-title">
                <div class="h2">
                    <span class="icon-wrapper">
                        <div class="heading-icon warning-icon"></div>
                    </span>
                    <span id="challenge-error-text">
                        Enable JavaScript and cookies to continue
                    </span>
                </div>
            </div>
        </noscript>
simonhughxyz commented 1 year ago
curl -A "Mozilla Firefox" -sL "$u" | rdrview -B lynx --disable-sandbox

This does not work. rdrview still cant extract content

simonhughxyz commented 1 year ago

I also found that cyberciti.biz is using CloudFlare and you need to pass a JavaScript challenge before loading the content:

Is there a way to pass the CloudFlare challenge? Or at least circumvent it?

simonhughxyz commented 1 year ago

I found a workaround to cloudflare, I used curl-impersonate and that seems to work.

eafer commented 4 months ago

Sorry for the long delay, but I guess you fixed this yourself and there's nothing much for me to say here.