Need some CI pipelines to validate the scripts to avoid any mistake

Generate base data local | python3 nds_gen_data.py local 1 2 $PWD/raw_sf1 --overwrite_output -- | -- Generate base data hdfs | python3 nds_gen_data.py hdfs 1 2 hdfs:/nds2.0_ci/raw_sf1 --overwrite_output Generate refresh data local | python3 nds_gen_data.py local 1 2 /user/$USER/raw_refresh_sf1 --overwrite_output --update Generate refresh data hdfs | python3 nds_gen_data.py hdfs 1 2 hdfs:/nds2.0_ci/raw_refresh_sf1 --overwrite_output --update Convert fresh data to parquet hdfs | ./spark-submit-template convert_submits_gpu.template nds_transcode.py hdfs:/nds2.0_ci/raw_refresh_sf1 hdfs:/nds2.0_ci/parquet_refresh_sf1 report.txt --output_format parquet --output_mode overwrite --update Convert base data to iceberg hdfs | ./spark-submit-template convert_submits_gpu.template nds_transcode.py hdfs:/nds2.0_ci/raw_sf1 hdfs:/nds2.0_ci/iceberg_sf1 report.txt --output_format iceberg --output_mode overwrite Generate query stream | python nds_gen_query_stream.py $TPCDS_HOME/query_templates 3000 ./query_streams --streams 1 Power run | ./spark-submit-template power_run_gpu.template \nds_power.py \hdfs:/nds2.0_ci/iceberg_sf1 \./nds_query_streams/query_0.sql \time.csv \--property_file properties/aqe-on.properties --input_format iceberg --output_prefix hdfs:/nds2.0_ci/gpu_output_sf1 Data validation | python nds_validate.py \hdfs:/nds2.0_ci/gpu_output_sf1 \hdfs:/nds2.0_ci/cpu_output_sf1 \./nds_query_streams/query_0.sql \--ignore_ordering Data maintenance | ./spark-submit-template convert_submit_gpu_iceberg.template \nds_maintenance.py \hdfs:/nds2.0_ci/parquet_refresh_sf1./data_maintenance \time.csv Throughput run | ./nds-throughput 1,2 \./spark-submit-template power_run_gpu.template \nds_power.py \hdfs:/nds2.0_ci/iceberg_sf1 \./nds_query_streams/query_'{}'.sql \Time_'{}'.csv \--input_format iceberg --output_prefix hdfs:/nds2.0_ci/gpu_output_sf1

NVIDIA / spark-rapids-benchmarks

Need some CI pipelines to validate the scripts to avoid any mistake #27