NVIDIA / spark-rapids-benchmarks

Spark RAPIDS Benchmarks – benchmark sets and utilities for the RAPIDS Accelerator for Apache Spark
Apache License 2.0
37 stars 27 forks source link

Adding HDFS support for data generation #188

Closed bilalbari closed 4 months ago

bilalbari commented 5 months ago

This PR contains the following changes -

  1. Adding DbGen class for running data generation as part of mapper
  2. Updating build files for the same
  3. Changes to README
  4. Changes to existing python files for supporting HDFS data generation