commoncrawl / cc-pyspark

Process Common Crawl data with Python and Spark
MIT License
406 stars 86 forks source link

Class org.apache.hadoop.fs.s3a.S3AFileSystem not found #30

Closed BrownXing closed 2 years ago

BrownXing commented 2 years ago

I met this error when running cc_index_word_count.py, it is raise when conducting this line:

./sparkcc.py, line 354 df = parquet_reader.load(table_path)

sebastian-nagel commented 2 years ago

Please see the README "Installation of S3 Support Libraries". Let me know if this does not work. Thanks!

BrownXing commented 2 years ago

Please see the README "Installation of S3 Support Libraries". Let me know if this does not work. Thanks!

Thanks ! I have solved this problem by adding the s3 support.