apache / incubator-stormcrawler

A scalable, mature and versatile web crawler based on Apache Storm
https://stormcrawler.apache.org/
Apache License 2.0
887 stars 262 forks source link

ParserFilter to exclude script and style tags from text extracted by StormCrawler. #638

Closed anveshv18 closed 5 years ago

jnioche commented 5 years ago

Similar to #146