ContentMine / quickscrape

A scraping command line tool for the modern web
MIT License
259 stars 42 forks source link

possible to just download json file? #69

Open muranava opened 8 years ago

muranava commented 8 years ago

sorry i know this is not an issue but contentmine.org seems to be down so hope you don't mind me asking for a feature request i.e. to switch off downloading of pdf and html? thanks mura

blahah commented 8 years ago

What is downloaded is entirely defined by the scraperJSON file at the moment - so if you just want to populate the JSON and not download any files, you'd have to remove all the download parts of the scraperJSON definitions.

However, this seems like a reasonable use-case so we will add a feature --no-downloads which simply ignores any downloads specified in the scraperJSON.

I'm flat out on other projects right now, but will implement this in the next few weeks.