ScaleUnlimited / flink-crawler

Continuous scalable web crawler built on top of Flink and crawler-commons
Apache License 2.0
51 stars 18 forks source link

Now support either -warccontentpath or -textcontentfile #169

Closed Schmed closed 6 years ago

Schmed commented 6 years ago

Cleaned up WARC/text content output support, verified that it runs again locally, and updated the wiki to match one renamed option