issues
search
NVIDIA
/
spark-rapids-benchmarks
Spark RAPIDS Benchmarks – benchmark sets and utilities for the RAPIDS Accelerator for Apache Spark
Apache License 2.0
36
stars
27
forks
source link
Need some CI pipelines to validate the scripts to avoid any mistake
#27
Open
GaryShen2008
opened
2 years ago
wjxiz1992
commented
2 years ago
We can add a pre-merge CI job for this repo.
wjxiz1992
commented
2 years ago
Generate base data local | python3 nds_gen_data.py local 1 2 $PWD/raw_sf1 --overwrite_output -- | -- Generate base data hdfs | python3 nds_gen_data.py hdfs 1 2 hdfs:/nds2.0_ci/raw_sf1 --overwrite_output Generate refresh data local | python3 nds_gen_data.py local 1 2 /user/$USER/raw_refresh_sf1 --overwrite_output --update Generate refresh data hdfs | python3 nds_gen_data.py hdfs 1 2 hdfs:/nds2.0_ci/raw_refresh_sf1 --overwrite_output --update Convert fresh data to parquet hdfs | ./spark-submit-template convert_submits_gpu.template nds_transcode.py hdfs:/nds2.0_ci/raw_refresh_sf1 hdfs:/nds2.0_ci/parquet_refresh_sf1 report.txt --output_format parquet --output_mode overwrite --update Convert base data to iceberg hdfs | ./spark-submit-template convert_submits_gpu.template nds_transcode.py hdfs:/nds2.0_ci/raw_sf1 hdfs:/nds2.0_ci/iceberg_sf1 report.txt --output_format iceberg --output_mode overwrite Generate query stream | python nds_gen_query_stream.py $TPCDS_HOME/query_templates 3000 ./query_streams --streams 1 Power run | ./spark-submit-template power_run_gpu.template \nds_power.py \hdfs:/nds2.0_ci/iceberg_sf1 \./nds_query_streams/query_0.sql \time.csv \--property_file properties/aqe-on.properties --input_format iceberg --output_prefix hdfs:/nds2.0_ci/gpu_output_sf1 Data validation | python nds_validate.py \hdfs:/nds2.0_ci/gpu_output_sf1 \hdfs:/nds2.0_ci/cpu_output_sf1 \./nds_query_streams/query_0.sql \--ignore_ordering Data maintenance | ./spark-submit-template convert_submit_gpu_iceberg.template \nds_maintenance.py \hdfs:/nds2.0_ci/parquet_refresh_sf1./data_maintenance \time.csv Throughput run | ./nds-throughput 1,2 \./spark-submit-template power_run_gpu.template \nds_power.py \hdfs:/nds2.0_ci/iceberg_sf1 \./nds_query_streams/query_'{}'.sql \Time_'{}'.csv \--input_format iceberg --output_prefix hdfs:/nds2.0_ci/gpu_output_sf1
We can add a pre-merge CI job for this repo.