USCDataScience / nutch-analytics

Nutch Crawl Analysis - Spark based project
Apache License 2.0
4 stars 0 forks source link

Ensure that timestamps are correct and available for inclusion in _id #3

Open wmburke opened 7 years ago

karanjeets commented 7 years ago

The "_id" creation is still being discussed with respect CDRv3.0 discussion. Nevertheless, the timestamp is now read from CrawlDB which was not the case earlier and should be correct.