Closed Panos-Bletsos closed 6 years ago
Hi @Panos-Bletsos I suppose "/tmp/tpcds-data" is not a HDFS / S3 etc. path that is accessible from the whole cluster? If you're generating data on a cluster, you must generate it to a filesystem and path that is accessible from the whole cluster.
Thanks @juliuszsompolski I used a HDFS directory and everything worked as expected. Thanks a lot!
When I try to setup TPCDS dataset in a cluster I get an error that Spark is not able to infer parquet schema. This happens only in cluster mode, in local mode the setup finishes successfully.
I have installed tpcds kit on all nodes under the same path and the location of the data is the same as well.
Specifically I try