commoncrawl / cc-pyspark

Process Common Crawl data with Python and Spark
MIT License
406 stars 86 forks source link

Allow to access WARC filename, record offset and length #8

Closed sebastian-nagel closed 5 years ago

sebastian-nagel commented 5 years ago

Allow to access WARC filename and from ArchiveIterator record offset and length (see #6)