canonical / bundle-kubeflow

Charmed Kubeflow
Apache License 2.0
98 stars 48 forks source link

Slow A100 than local docker. #598

Open HaloKim opened 1 year ago

HaloKim commented 1 year ago

Hello, I am running charmed kubeflow onpremise.

I have question.

There is a big difference in gpu speed between docker and kubeflow jupyter, but I don't know the cause.

I run this code,

import torch
import time
torch.backends.cuda.matmul.allow_tf32 = True
x = torch.rand(1000, 1000, device="cuda")

# warmup
for _ in range(10):
    y = torch.matmul(x, x)

nb_iters = 1000
torch.cuda.synchronize()
t0 = time.perf_counter()
for _ in range(nb_iters):
    y = torch.matmul(x, x)
torch.cuda.synchronize()
t1 = time.perf_counter()
print("GPU: {}iters/s, {}s/iter".format(nb_iters/(t1 - t0), (t1 - t0)/nb_iters))

x = torch.randn(1000, 1000)
# warmup
for _ in range(10):
    y = torch.matmul(x, x)

t0 = time.perf_counter()
for _ in range(nb_iters):
    y = torch.matmul(x, x)
t1 = time.perf_counter()
print("CPU: {}iters/s, {}s/iter".format(nb_iters/(t1 - t0), (t1 - t0)/nb_iters))

Kubeflow jupyter output

GPU: 30086.7809964644iters/s, 3.323718812316656e-05s/iter
CPU: 875.2453138719702iters/s, 0.0011425368227064609s/iter

Local Docker output

GPU: 8084.810933666074iters/s, 0.00012368873041123152s/iter
CPU: 664.2903438954845iters/s, 0.0015053658527322113s/iter

Server env

No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 22.04.1 LTS Release: 22.04 Codename: jammy

Driver Version: 515.86.01 CUDA Version: 11.7 cuda_11.8.r11.8 cudnn 8.4.1

Client Version: v1.24.13-2+cd9733de84ad4b Kustomize Version: v4.5.4 Server Version: v1.24.13-2+cd9733de84ad4b

charmed kubeflow 1.7

kimwnasptd commented 1 year ago

@HaloKim this is interesting!

I have a question regarding the testing methodology: did you run both the docker notebook and the KF Notebook in the same node on your onpremise cluster?

Want to rule out that it could be caused by the node itself and understand if it's an issue related specifically to Charmed Kubeflow.

HaloKim commented 1 year ago

@HaloKim this is interesting!

I have a question regarding the testing methodology: did you run both the docker notebook and the KF Notebook in the same node on your onpremise cluster?

Want to rule out that it could be caused by the node itself and understand if it's an issue related specifically to Charmed Kubeflow.

Sorry, I was thinking wrong. Higher is better, but when I checked again, I was using "torch.backends.cuda.matmul.allow_tf32=False" in docker. However, it doesn't seem like a big problem, but in KF jupyter, "torch.backends.cuda.matmul.allow_tf32" has little difference in speed whether it is True or False.

image

Thank you for your reply.