commoncrawl / cc-index-table

Index Common Crawl archives in tabular format
Apache License 2.0
106 stars 9 forks source link

Add columns for redirect targets and WARC truncation #6

Closed sebastian-nagel closed 4 years ago

sebastian-nagel commented 5 years ago

commoncrawl/nutch#15 adds "redirect" and "truncated" as optional JSON fields to the CDX index. These should be added as new columns to the index table as well.