datadryad / hive-mrc

Helping Interdisciplinary Vocabulary Engineering (HIVE)
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Add command-line support to SimpleTextCrawler #40

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago

Add Apache-CLI (modeled after Admin vocabularies)
-f input file of URLs to be crawled
-o output directory where text will be dumped
-n number of hops (default 0 - first page only)
-m number of terms (default 10)
-d enable differencing

Original issue reported on code.google.com by craig.wi...@unc.edu on 16 Dec 2011 at 5:26

GoogleCodeExporter commented 9 years ago
Fixed in release 2.1

Original comment by craig.wi...@unc.edu on 11 May 2012 at 3:18