Mixed eltype contraction failing with CuTensor

Describe the bug

Some mixed element type contractions are failing when trying to contract with cuTENSOR.

To reproduce

The Minimal Working Example (MWE) for this bug:

a = CuArray{Float32}(undef, (10,10))
b = CuArray{ComplexF32}(undef, (10, 5))
c = a * b ## works fine regardless of mixed type

CT = CuTensor(a, [1,2]) * CuTensor(b, [2,3])
ERROR: KeyError: key (Float32, ComplexF32, ComplexF32) not found

CT = CuTensor(b, [2,3]) * CuTensor(a, [1,2])
ERROR: KeyError: key (ComplexF32, Float32, ComplexF32) not found 

a = CuArray{Float64}(undef, (10,10))
b = CuArray{ComplexF32}(undef, (10, 5))

CT = CuTensor(a, [1,2]) * CuTensor(b, [2,3])
ERROR: KeyError: key (Float64, ComplexF32, ComplexF64) not found

b = CuArray{ComplexF64}(undef, (10, 5))
CT = CuTensor(a, [1,2]) * CuTensor(b, [2,3]) ## works with no issue

All of these contractions have no issues when not wrapping the CuArrays in CuTensors

Manifest.toml

``` [052768ef] CUDA v5.3.1 [46192b85] GPUArraysCore v0.1.6 [011b41b2] cuTENSOR v2.1.0 ```

Version info

Details on Julia:

Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 32 × Intel(R) Xeon(R) Gold 6244 CPU @ 3.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, cascadelake)
Threads: 1 default, 0 interactive, 1 GC (on 32 virtual cores)

Details on CUDA:

julia> CUDA.versioninfo()
CUDA runtime 12.4, artifact installation
CUDA driver 12.4
NVIDIA driver 535.154.5, originally for CUDA 12.2

CUDA libraries: 
- CUBLAS: 12.4.5
- CURAND: 10.3.5
- CUFFT: 11.2.1
- CUSOLVER: 11.6.1
- CUSPARSE: 12.3.1
- CUPTI: 22.0.0
- NVML: 12.0.0+535.154.5

Julia packages: 
- CUDA: 5.3.0
- CUDA_Driver_jll: 0.8.1+0
- CUDA_Runtime_jll: 0.12.1+0

Toolchain:
- Julia: 1.10.2
- LLVM: 15.0.7

1 device:
  0: NVIDIA RTX A6000 (sm_86, 45.532 GiB / 47.988 GiB available)

JuliaGPU / CUDA.jl

Mixed eltype contraction failing with CuTensor #2349