NVIDIA / spark-rapids-tools

User tools for Spark RAPIDS
Apache License 2.0
44 stars 34 forks source link

[FEA] spark_rapids user tools generates the wrong worker info to pass into java qual tool on databricks aws #1087

Closed tgravescs closed 4 weeks ago

tgravescs commented 4 weeks ago

Is your feature request related to a problem? Please describe.

spark_rapids qualification \
  --platform databricks-aws \
  --cluster 0606-193048-sgjfvx5 \
  --output_folder . \
  --verbose \
  --estimation_model xgboost \
  --tools_jar /home/tgraves/workspace/spark-rapids-tools2/core/target/rapids-4-spark-tools_2.12-24.04.1-SNAPSHOT.jar \
  --eventlogs /home/tgraves/runspace/qualEventLogs/dataproc-21/application_1716441747790_0001
2024-06-07 08:42:07,570 DEBUG rapids.tools.cmd: submitting system command: <aws ec2 describe-instance-types --region us-west-2 --instance-types r6id.xlarge>

the r6id.xlarge node has 32GB memory. but it generates worker_info file that is passed into java as:

system:
  numCores: 4
  memory: 16384MiB
  numWorkers: 4
tgravescs commented 4 weeks ago

nevermind, the issue is that i maps this ot a g5.xlarge node which has 16GB.