Open anjackson opened 2 months ago
As it's always difficult to find shareable files of various formats, one option would be to use the Common Crawl indexes to find relevant items. Common Crawl publish Apache Parquet indexes which can be used for this kind of thing. e.g.
Needs thinking through, and understanding if what any costs and impacts are.
Note some prior work that is related:
As it's always difficult to find shareable files of various formats, one option would be to use the Common Crawl indexes to find relevant items. Common Crawl publish Apache Parquet indexes which can be used for this kind of thing. e.g.
Needs thinking through, and understanding if what any costs and impacts are.
Note some prior work that is related: