NVIDIA / spark-rapids-tools

User tools for Spark RAPIDS
Apache License 2.0
44 stars 34 forks source link

[BUG] Qualification tool cluster recommendation on Dataproc defaults numGpus recommended is 2 #1159

Closed tgravescs closed 5 days ago

tgravescs commented 6 days ago

Describe the bug Running the qualification tool against Dataproc using the --cluster argument that ran against an n1-standard-2 cpu node gives a bad number of GPU recommendation. It looks like our python code defaults this to 2 if no gpu cluster is inferred.

It doesn't make sense to use 2 gpus per worker no an n1-standard-2. We should like use the same logic we have elsewhere that says if its < 16 cores use 1 gpu, else 2 gpus.

    "targetCluster": {
      "driverInstance": "n1-standard-2",
      "executorInstance": "n1-standard-2",
      "numExecutors": 2,
      "gpuInfo": {
        "device": "nvidia-tesla-t4",
        "gpuPerWorker": 2
      },
      "additionalConfig": {
        "localSsd": 2
      }
    }
tgravescs commented 5 days ago

closing this as its an issue with my dev code