apache / incubator-stormcrawler

A scalable, mature and versatile web crawler based on Apache Storm
https://stormcrawler.apache.org/
Apache License 2.0
889 stars 260 forks source link

Normalise HTTP headers #108

Open jnioche opened 9 years ago

jnioche commented 9 years ago

Port SpellCheckedMetadata from Nutch to cater for variations returned by servers

jnioche commented 9 years ago

Note : should probably reuse org.apache.http.HttpHeaders instead of having our own version. Need to check whether the Apache http core library has any mechanism for normalising the keys returned by a server.