onnxruntime: fails to run with CUDA execution provider

Describe the bug

onnxruntime using CUDA is packaged with both onnxruntime_USE_CUDA and onnxruntime_DISABLE_CONTRIB_OPS which effectively disables CUDA and leads to following error on runtime

> onnxruntime_test Phi-3-mini-128k-instruct-onnx/cuda/cuda-int4-rtn-block-32/phi3-mini-128k-instruct-cuda-int4-rtn-block-32.onnx
2024-05-11 10:27:29.056205252 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:2103 CreateInferencePybindStateModule] Init provider bridge failed.
Traceback (most recent call last):
  File "/nix/store/3wc2a16gdvms53vgr2jp9f8z2mv55dkw-python3.11-onnxruntime-1.17.3/bin/.onnxruntime_test-wrapped", line 9, in <module>
    sys.exit(main())
             ^^^^^^
  File "/nix/store/igdsm7xzsfsbyjfhrvgw23xsxj21fgln-python3-3.11.9-env/lib/python3.11/site-packages/onnxruntime/tools/onnxruntime_test.py", line 159, in main
    exit_code, _, _ = run_model(args.model_path, args.num_iters, args.debug, args.profile, args.symbolic_dims)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/igdsm7xzsfsbyjfhrvgw23xsxj21fgln-python3-3.11.9-env/lib/python3.11/site-packages/onnxruntime/tools/onnxruntime_test.py", line 88, in run_model
    sess = onnxrt.InferenceSession(
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/igdsm7xzsfsbyjfhrvgw23xsxj21fgln-python3-3.11.9-env/lib/python3.11/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/nix/store/igdsm7xzsfsbyjfhrvgw23xsxj21fgln-python3-3.11.9-env/lib/python3.11/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 472, in _create_inference_session
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
onnxruntime.capi.onnxruntime_pybind11_state.InvalidGraph: [ONNXRuntimeError] : 10 : INVALID_GRAPH : Load model from Phi-3-mini-128k-instruct-onnx/cuda/cuda-int4-rtn-block-32/phi3-mini-128k-instruct-cuda-int4-rtn-block-32.onnx failed:This is an invalid model. In Node, ("/model/layers.0/input_layernorm/LayerNorm", SimplifiedLayerNormalization, "", -1) : ("/model/embed_tokens/Gather/output_0": tensor(float16),"model.layers.0.input_layernorm.weight": tensor(float16),) -> ("/model/layers.0/input_layernorm/output_0": tensor(float16),) , Error No Op registered for SimplifiedLayerNormalization with domain_version of 14

> onnxruntime_test Phi-3-mini-128k-instruct-onnx/cuda/cuda-fp16/phi3-mini-128k-instruct-cuda-fp16.onnx
2024-05-11 10:27:46.458111088 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:2103 CreateInferencePybindStateModule] Init provider bridge failed.
Traceback (most recent call last):
  File "/nix/store/3wc2a16gdvms53vgr2jp9f8z2mv55dkw-python3.11-onnxruntime-1.17.3/bin/.onnxruntime_test-wrapped", line 9, in <module>
    sys.exit(main())
             ^^^^^^
  File "/nix/store/igdsm7xzsfsbyjfhrvgw23xsxj21fgln-python3-3.11.9-env/lib/python3.11/site-packages/onnxruntime/tools/onnxruntime_test.py", line 159, in main
    exit_code, _, _ = run_model(args.model_path, args.num_iters, args.debug, args.profile, args.symbolic_dims)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/igdsm7xzsfsbyjfhrvgw23xsxj21fgln-python3-3.11.9-env/lib/python3.11/site-packages/onnxruntime/tools/onnxruntime_test.py", line 88, in run_model
    sess = onnxrt.InferenceSession(
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/igdsm7xzsfsbyjfhrvgw23xsxj21fgln-python3-3.11.9-env/lib/python3.11/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/nix/store/igdsm7xzsfsbyjfhrvgw23xsxj21fgln-python3-3.11.9-env/lib/python3.11/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 472, in _create_inference_session
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
onnxruntime.capi.onnxruntime_pybind11_state.InvalidGraph: [ONNXRuntimeError] : 10 : INVALID_GRAPH : Load model from Phi-3-mini-128k-instruct-onnx/cuda/cuda-fp16/phi3-mini-128k-instruct-cuda-fp16.onnx failed:This is an invalid model. In Node, ("/model/layers.0/input_layernorm/LayerNorm", SimplifiedLayerNormalization, "", -1) : ("/model/embed_tokens/Gather/output_0": tensor(float16),"model.layers.0.input_layernorm.weight": tensor(float16),) -> ("/model/layers.0/input_layernorm/output_0": tensor(float16),) , Error No Op registered for SimplifiedLayerNormalization with domain_version of 14

Related issue mainstream https://github.com/microsoft/onnxruntime/issues/20658

Steps To Reproduce

Steps to reproduce the behavior:

Enter shell with onnxruntime and CUDA enabled (sorry, but I'm not sure how to run one-liner nix-shell command with CUDA enabled) In your config or overlay

config = { allowUnfree = true; cudaSupport = true; };

shell.nix

mkShell {
packages = [
  onnxruntime
  (pkgs.python3.withPackages (python-pkgs: [
    python-pkgs.huggingface-hub
    python-pkgs.numpy
    python-pkgs.onnxruntime
    # genai is not packaged, but available as PR 
    # python-pkgs.onnxruntime-genai
  ]))
];
}

Get the model

huggingface-cli download microsoft/Phi-3-mini-128k-instruct-onnx --include "cuda/cuda-int4-rtn-block-32/*" --local-dir Phi-3-mini-128k-instruct-onnx

Run test

onnxruntime_test Phi-3-mini-128k-instruct-onnx/cuda/cuda-int4-rtn-block-32/phi3-mini-128k-instruct-cuda-int4-rtn-block-32.onnx

Expected behavior

onnxruntime starts executing on GPU

Additional context

Removing onnxruntime_DISABLE_CONTRIB_OPS allows the model to start

diff --git a/pkgs/development/libraries/onnxruntime/default.nix b/pkgs/development/libraries/onnxruntime/default.nix
index 85e2c70ba408..2e09b541f1a7 100644
--- a/pkgs/development/libraries/onnxruntime/default.nix
+++ b/pkgs/development/libraries/onnxruntime/default.nix
@@ -187,7 +187,7 @@ effectiveStdenv.mkDerivation rec {
     "-D_SILENCE_ALL_CXX23_DEPRECATION_WARNINGS=1"
     (lib.cmakeBool "onnxruntime_USE_CUDA" cudaSupport)
     (lib.cmakeBool "onnxruntime_USE_NCCL" cudaSupport)
-    (lib.cmakeBool "onnxruntime_DISABLE_CONTRIB_OPS" cudaSupport)
+    # (lib.cmakeBool "onnxruntime_DISABLE_CONTRIB_OPS" cudaSupport)
   ] ++ lib.optionals pythonSupport [
     "-Donnxruntime_ENABLE_PYTHON=ON"
   ] ++ lib.optionals cudaSupport [

Notify maintainers

@jonringer @puffnfresh @ck3d @cbourjau @wexder

Metadata

 - system: `"x86_64-linux"`
 - host os: `Linux 6.8.9, NixOS, 24.05 (Uakari), 24.05.20240508.8892ecd`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.18.2`
 - channels(a): `"nixpkgs"`
 - channels(root): `"nixos"`
 - nixpkgs: `/home/a/.nix-defexpr/channels/nixpkgs`

Add a :+1: reaction to issues you find important.

NixOS / nixpkgs