canonical / kubeflow-examples

Charmed Kubeflow examples
Apache License 2.0
32 stars 9 forks source link

OP_REQUIRES failed at conv_ops.cc:1106 : Not found: No algorithm worked! #36

Closed ACodingfreak closed 7 months ago

ACodingfreak commented 1 year ago

Hi All,

I am trying out "digit-recognition-kaggle-competition" in charmed kubeflow using RTX3060 on Ubuntu 22.04 https://github.com/kubeflow/examples/tree/master/digit-recognition-kaggle-competition

It is failing with the below error. Any ideas what I am doing wrong here ?

history = model.fit(np.array(X_train), np.array(y_train), 
                    validation_split=.1, batch_size=int(BATCH_SIZE), epochs=int(EPOCHS))

2023-06-14 21:19:07.654236: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2023-06-14 21:19:07.654500: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2112000000 Hz
Epoch 1/2
2023-06-14 21:19:07.897998: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2023-06-14 21:19:08.249471: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2023-06-14 21:19:08.251540: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2023-06-14 21:19:08.955292: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at conv_ops.cc:1106 : Not found: No algorithm worked!

Same model using CPU works fine instead of GPU works just fine

DnPlas commented 7 months ago

Hi @ACodingfreak, could your issue be the same as https://github.com/tensorflow/tensorflow/issues/45044? It sounds like it. Since this looks more like an issue related to the ML framework and your setup than to the charms, I am closing this issue, but feel free to file a new one if you have more questions related to Charmed Kubeflow.