The schema of the table created by CCIndex2Table is fixed to the built-in schema used by/for Common Crawl. In order to support other crawl archives, it would be optimal to keep the table schema configurable:
allow to pass a custom-defined table schema (as JSON file) which defines the output table
(eventually) split the class into a generic one (requiring a custom schema) and a CC-specific one. This would also allow to more easily adapt the parsing of a custom CDX input.
The schema of the table created by CCIndex2Table is fixed to the built-in schema used by/for Common Crawl. In order to support other crawl archives, it would be optimal to keep the table schema configurable:
allow to pass a custom-defined table schema (as JSON file) which defines the output table
(eventually) split the class into a generic one (requiring a custom schema) and a CC-specific one. This would also allow to more easily adapt the parsing of a custom CDX input.