ContentMine / quickscrape

A scraping command line tool for the modern web
MIT License
259 stars 42 forks source link

Quickscrape url retrieval failing? #97

Closed chartgerink closed 7 years ago

chartgerink commented 7 years ago

I was testing some scraping procedures on my windows machine (node: v6.9.2; npm v4.0.5) and directly installed quickscrape from the master branch available on github. I am getting the following error when running quickscrape --urllist urls.txt --scraperdir journal-scrapers\scrapers -output test, where quickscrape apparently fails to retrieve the urllist properly?

The url file is not malformed (tested that) and I get the same error when I input 1 url directly.

info: quickscrape 0.4.7 launched with...
info: - URL: -t
info: - Scraperdir: E:\journal-scrapers\scrapers
info: - Rate limit: 3 per minute
info: - Log level: info
info: urls to scrape: 1
info: processing URL: -t
C:\Users\u1233095\AppData\Roaming\npm\node_modules\quickscrape\node_modules\thresher\lib\url.js:29
    throw e;
    ^

Error: malformed URL: -t; protocol missing (must include http(s):// or ftp(s)://), domain missing
    at Object.url.checkUrl (C:\Users\u1233095\AppData\Roaming\npm\node_modules\quickscrape\node_modules\thresher\lib\url
.js:28:13)
    at Thresher.scrape (C:\Users\u1233095\AppData\Roaming\npm\node_modules\quickscrape\node_modules\thresher\lib\threshe
r.js:54:7)
    at processUrl (C:\Users\u1233095\AppData\Roaming\npm\node_modules\quickscrape\bin\quickscrape.js:273:5)
    at Object.<anonymous> (C:\Users\u1233095\AppData\Roaming\npm\node_modules\quickscrape\bin\quickscrape.js:277:1)
    at Module._compile (module.js:570:32)
    at Object.Module._extensions..js (module.js:579:10)
    at Module.load (module.js:487:32)
    at tryModuleLoad (module.js:446:12)
    at Function.Module._load (module.js:438:3)
    at Module.runMain (module.js:604:10)
chartgerink commented 7 years ago

Oh I went and just malformed the request by using -output instead of --output.

Sorry.