Open ZSL98 opened 4 months ago
Ah.. It seems that at the compile stage, the '-lcudnn' is required. However, there are still other bugs.
When I run:
LD_PRELOAD="/root/orion/src/cuda_capture/libinttemp.so" python3.10 benchmarking/launch_jobs.py --algo orion --config_file /root/orion/artifact_evaluation/example/config.json
The error:
python3.10: intercept_cudnn.cpp:177: cudnnStatus_t cudnnBatchNormalizationForwardInference(cudnnHandle_t, cudnnBatchNormMode_t, const void*, const void*, cudnnTensorDescriptor_t, const void*, cudnnTensorDescriptor_t, void*, cudnnTensorDescriptor_t, const void*, const void*, const void*, const void*, double): Assertion 'cudnn_bnorm_infer_func != NULL' failed. Aborted (core dumped)
It seems that the API capture code only supports cuda-10.2. Could you please share more on how to apply API capturing on newer cuda? Or maybe there are other reasons?
Hi, yes the current open-source version supports only CUDA-10.2. Our next version will enable more up-to-date CUDA libraries. See also my comment in #31.
Can you notify me when this issue is resolved?
I face the error:
OSError: /root/orion/src/scheduler/scheduler_eval.so: undefined symbol: cudnnSetStream
when I am using cuda11.8. How to deal with that?