Closed AlexYuan closed 2 years ago
Hi @AlexYuan, thanks for the report. Fetching URLs may occasionally fail for one reason or another. For the NYT article, I can't personally reproduce the issue, while the WaPo article is redirecting in an infinite loop. Adding the --debug
flag to the command may offer additional clues to what's going on.
In any case, when percollate is unable to fetch an URL, you can fetch it externally and pass it to percollate
on STDIN, like the example below:
curl https://www.nytimes.com/2022/04/27/us/politics/ukraine-war-expansion.html | percollate pdf --output=some1.pdf - --url=https://www.nytimes.com/2022/04/27/us/politics/ukraine-war-expansion.html
Notice that when we use STDIN (via the -
operand), we provide the web page's original URL with the --url
option.
thanks for your reply. I can visit the webpage article of NYT in my chrome(edge) brower through VPN , so,maybe we need add an arg like '--proxy' in percollate command line? or an arg like '--timeout' (>30*1000ms)?
The request for proxy options has come up before. In order to keep percollate's focus narrow, currently any URL fetch that needs a custom setup is preferably fetched outside percollate. (See https://github.com/danburzo/percollate/issues/23)
Environment
node --version
: v14.17.5npm --version
: 6.14.14yarn --version
, if using Yarn:percollate --version
: 2.2.0Description
when I exec this cmd line:
(or url: https://www.washingtonpost.com/politics/2022/02/28/russia-ukraine-logistics-invasion/) I got this error info :
but,puppeteer can load these url and output pdf.