ariovistus / pyd

Interoperability between Python and D
MIT License
157 stars 32 forks source link

Q: when calling Python script from D, how to properly setup env (for loading dynamical libraries .so)? #156

Open mw66 opened 3 years ago

mw66 commented 3 years ago

Hi,

I encountered a strange error, my Python program behaves differently between stand-alone run v.s called by D program via pyd, I noticed it could caused by the loading dynamical libraries .so differently:

Python stand-alone, run log: """ 2021-05-24 18:48:06.939476: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2) 2021-05-24 18:48:06.958735: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3492480000 Hz
2021-05-24 18:48:07.603718: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-05-24 18:48:10.760915: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-05-24 18:48:10.763691: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
""" please note: libcublas.so.11 is loaded first, and the run succeeds.

When the same Python script called by pyd, the run log is: """ 2021-05-24 19:03:30.943183: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2) 2021-05-24 19:03:31.098807: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3492480000 Hz [New Thread 0x7ffe397fa700 (LWP 23546)] [Thread 0x7ffe397fa700 (LWP 23546) exited] [New Thread 0x7ffe397fa700 (LWP 23547)] [New Thread 0x7ffe38ff9700 (LWP 23548)] [New Thread 0x7ffe39ffb700 (LWP 23549)] [New Thread 0x7ffcf3fff700 (LWP 23550)] 2021-05-24 19:03:35.463795: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11 2021-05-24 19:03:38.561243: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8 2021-05-24 19:03:57.926635: E tensorflow/stream_executor/cuda/cuda_blas.cc:197] failed to set new cublas math mode: CUBLAS_STATUS_INVALID_VALUE Traceback (most recent call last): """ please note: libcublas.so.11 is NOT loaded, and the 1st load become libcublasLt.so.11; and then the run fails.

I tried very hard to make sure that at shell command level, I'm setting the same env vars in the two scenarios.

But why the Python program called by pyd from D skip loading some dynamic library (i.e. libcublas.so.11 in this case)?

Is there something (env var) I need to setup in the D program when calling pyd?

(Another thing that looks suspicious is: there are some thread activity going on before loading those library, this only happened in the pyd run, not sure if it's related).