I encountered a strange error, my Python program behaves differently between stand-alone run v.s called by D program via pyd, I noticed it could caused by the loading dynamical libraries .so differently:
Python stand-alone, run log:
"""
2021-05-24 18:48:06.939476: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-05-24 18:48:06.958735: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3492480000 Hz
2021-05-24 18:48:07.603718: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-05-24 18:48:10.760915: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-05-24 18:48:10.763691: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
"""
please note: libcublas.so.11 is loaded first, and the run succeeds.
When the same Python script called by pyd, the run log is:
"""
2021-05-24 19:03:30.943183: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-05-24 19:03:31.098807: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3492480000 Hz
[New Thread 0x7ffe397fa700 (LWP 23546)]
[Thread 0x7ffe397fa700 (LWP 23546) exited]
[New Thread 0x7ffe397fa700 (LWP 23547)]
[New Thread 0x7ffe38ff9700 (LWP 23548)]
[New Thread 0x7ffe39ffb700 (LWP 23549)]
[New Thread 0x7ffcf3fff700 (LWP 23550)]
2021-05-24 19:03:35.463795: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-05-24 19:03:38.561243: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-05-24 19:03:57.926635: E tensorflow/stream_executor/cuda/cuda_blas.cc:197] failed to set new cublas math mode: CUBLAS_STATUS_INVALID_VALUE
Traceback (most recent call last):
"""
please note: libcublas.so.11 is NOT loaded, and the 1st load become libcublasLt.so.11; and then the run fails.
I tried very hard to make sure that at shell command level, I'm setting the same env vars in the two scenarios.
But why the Python program called by pyd from D skip loading some dynamic library (i.e. libcublas.so.11 in this case)?
Is there something (env var) I need to setup in the D program when calling pyd?
(Another thing that looks suspicious is: there are some thread activity going on before loading those library, this only happened in the pyd run, not sure if it's related).
Hi,
I encountered a strange error, my Python program behaves differently between stand-alone run v.s called by D program via pyd, I noticed it could caused by the loading dynamical libraries .so differently:
Python stand-alone, run log: """ 2021-05-24 18:48:06.939476: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2) 2021-05-24 18:48:06.958735: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3492480000 Hz
2021-05-24 18:48:07.603718: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-05-24 18:48:10.760915: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-05-24 18:48:10.763691: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
""" please note: libcublas.so.11 is loaded first, and the run succeeds.
When the same Python script called by pyd, the run log is: """ 2021-05-24 19:03:30.943183: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2) 2021-05-24 19:03:31.098807: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3492480000 Hz [New Thread 0x7ffe397fa700 (LWP 23546)] [Thread 0x7ffe397fa700 (LWP 23546) exited] [New Thread 0x7ffe397fa700 (LWP 23547)] [New Thread 0x7ffe38ff9700 (LWP 23548)] [New Thread 0x7ffe39ffb700 (LWP 23549)] [New Thread 0x7ffcf3fff700 (LWP 23550)] 2021-05-24 19:03:35.463795: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11 2021-05-24 19:03:38.561243: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8 2021-05-24 19:03:57.926635: E tensorflow/stream_executor/cuda/cuda_blas.cc:197] failed to set new cublas math mode: CUBLAS_STATUS_INVALID_VALUE Traceback (most recent call last): """ please note: libcublas.so.11 is NOT loaded, and the 1st load become libcublasLt.so.11; and then the run fails.
I tried very hard to make sure that at shell command level, I'm setting the same env vars in the two scenarios.
But why the Python program called by pyd from D skip loading some dynamic library (i.e. libcublas.so.11 in this case)?
Is there something (env var) I need to setup in the D program when calling pyd?
(Another thing that looks suspicious is: there are some thread activity going on before loading those library, this only happened in the pyd run, not sure if it's related).