apache / incubator-stormcrawler

A scalable, mature and versatile web crawler based on Apache Storm
https://stormcrawler.apache.org/
Apache License 2.0
883 stars 261 forks source link

File Protocol #435

Closed isspek closed 7 years ago

isspek commented 7 years ago

I have adopted File protocol implementation in Apache Nutch to Storm Crawler. See the implementations in (https://gist.github.com/isspek/32e9d762666593b4781ef3a0155dd74b) It works but needs revision. I need your suggestions. Some functions are exactly same as the class in Apache Nutch.

jnioche commented 7 years ago

Looks good, thanks! please open a Pull Request, it will make it easier to review your code. One change you could do before that would be to add the license headers at the top of your files. Also please use parameterized messages for the logs and StringBuilder instead of StringBuffer +and avoid manual concatenations as in x.append("<title>Index of " + path + "</title></head>\n");).