I have a slightly strange problem with quickscrape.
I want to run something like this: quickscrape --urllist test_dois.txt --scraper ../journal-scrapers/scrapers/plos.json --output plos-test2
That is, I want to use relative paths for the URL list and the scraper file.
When running this on OS X it works fine, but when running on my Linux server I get an error saying that it can't find the urllist file.
Simplifying this a bit and looking just at the urllist file, if I run ./quickscrape.js --urllist test_dois.txt --scraper /mnt/cm-volume/content-mine/journal-scrapers/scrapers/plos.json --output plos-test2 I get:
info: quickscrape 0.4.7 launched with...
info: - URLs from file: undefined
info: - Scraper: /mnt/cm-volume/content-mine/journal-scrapers/scrapers/plos.json
info: - Rate limit: 3 per minute
info: - Log level: info
fs.js:427
return binding.open(pathModule._makeLong(path), stringToFlags(flags), mode);
^
Error: ENOENT, no such file or directory 'test_dois.txt'
at Object.fs.openSync (fs.js:427:18)
at Object.fs.readFileSync (fs.js:284:15)
at loadUrls (/mnt/cm-volume/content-mine/quickscrape/bin/quickscrape.js:154:17)
at Object.<anonymous> (/mnt/cm-volume/content-mine/quickscrape/bin/quickscrape.js:164:41)
at Module._compile (module.js:456:26)
at Object.Module._extensions..js (module.js:474:10)
at Module.load (module.js:356:32)
at Function.Module._load (module.js:312:12)
at Function.Module.runMain (module.js:497:10)
at startup (node.js:119:16)
I have absolutely no idea why this is behaving differently on Linux to OS X.
Interesting, I seem to be able to fix this error by moving the process.chdir call further down the file - so that it is called only after the URL list has been loaded (see the diff at https://github.com/ContentMine/quickscrape/compare/master...robintw:relative-paths). This seems to work on both Linux and OS X, and I'm happy to submit this as a PR if that would be useful.
I must say, I'm a bit confused by all of this though - and wondering whether I am being really stupid!
I have a slightly strange problem with quickscrape.
I want to run something like this:
quickscrape --urllist test_dois.txt --scraper ../journal-scrapers/scrapers/plos.json --output plos-test2
That is, I want to use relative paths for the URL list and the scraper file.
When running this on OS X it works fine, but when running on my Linux server I get an error saying that it can't find the urllist file.
Simplifying this a bit and looking just at the urllist file, if I run
./quickscrape.js --urllist test_dois.txt --scraper /mnt/cm-volume/content-mine/journal-scrapers/scrapers/plos.json --output plos-test2
I get:I have absolutely no idea why this is behaving differently on Linux to OS X.
Interesting, I seem to be able to fix this error by moving the
process.chdir
call further down the file - so that it is called only after the URL list has been loaded (see the diff at https://github.com/ContentMine/quickscrape/compare/master...robintw:relative-paths). This seems to work on both Linux and OS X, and I'm happy to submit this as a PR if that would be useful.I must say, I'm a bit confused by all of this though - and wondering whether I am being really stupid!