USCDataScience / sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
http://irds.usc.edu/sparkler/
Apache License 2.0
411 stars 143 forks source link

Parse and Index Response headers #90

Closed thammegowda closed 7 years ago

thammegowda commented 7 years ago

Changelog:

Fetcher: carries forward the header Parser: Parses common headers that require parsing, eg: dates, long/ints Solr: Indexes headers as dynamic field (ends with suffix _hd)

Note: Schema update required since new dynamic fields are added for headers

Closes #89

thammegowda commented 7 years ago

@karanjeets please review and merge this PR.

Let me know if you have questions!

karanjeets commented 7 years ago

@thammegowda Awesome!!!! Clean Code! Merging it with master.