OpenVisualCloud / Media-Transport-Library

A real-time media transport(DPDK, AF_XDP, RDMA) stack for both raw and compressed video based on COTS hardware.
BSD 3-Clause "New" or "Revised" License
164 stars 53 forks source link

Runtime error with Nvidia GPU #994

Open chenzuozhou opened 6 days ago

chenzuozhou commented 6 days ago

Hi! Thank you for submitting the bug. Please provide more details below:

Describe the bug When run "GpuDirectVideoRxMultiSample", level zero library reports error "Runtime error: zeInit(ZE_INIT_FLAG_GPU_ONLY) returned 2013265921 at ../gpu.c:42Initialization error: init_level_zero_lib() returned -1 at ../gpu.c:54"

GPU infos ~/Media-Transport-Library-1008# nvidia-smi Tue Oct 8 14:20:11 2024
+-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.120 Driver Version: 550.120 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |======================================+=====================+===================| | 0 NVIDIA GeForce RTX 4090 Off | 00000000:04:00.0 Off | Off | | 30% 30C P8 12W / 450W | 29MiB / 24564MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 1 NVIDIA GeForce RTX 4090 Off | 00000000:89:00.0 Off | Off | | 30% 27C P8 11W / 450W | 12MiB / 24564MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+

Logs ~/Media-Transport-Library-1008# ./build/app/GpuDirectVideoRxMultiSample 192.168.1176 192.168.1.170 20000 --p_port 0000:84:00.0 Runtime error: zeInit(ZE_INIT_FLAG_GPU_ONLY) returned 2013265921 at ../gpu.c:42Initialization error: init_level_zero_lib() returned -1 at ../gpu.c:54 MTL: 2024-10-08 11:13:26, dev_eal_init(0), port_param: 0000:84:00.0 MTL: 2024-10-08 11:13:26, dev_eal_init, main_lcore: 0 MTL: 2024-10-08 11:13:26, dev_eal_init, wait eal_init_thread done EAL: Detected CPU lcores: 32 EAL: Detected NUMA nodes: 2 EAL: Detected shared linkage of DPDK EAL: Selected IOVA mode 'VA' EAL: No free 1048576 kB hugepages reported on node 0 EAL: No free 1048576 kB hugepages reported on node 1 EAL: VFIO support initialized EAL: Probe PCI driver: mlx5_pci (15b3:1017) device: 0000:84:00.0 (socket 1) TELEMETRY: No legacy callbacks, legacy socket not created MTL: 2024-10-08 11:13:27, mtl_init, MTL version: 24.12.0.DEV Thu Oct 3 18:30:33 2024 gcc-7.5.0, dpdk version: DPDK 23.11.0 MTL: 2024-10-08 11:13:27, mtl_init, MTL_HAS_USDT is defined for this build MTL: 2024-10-08 11:13:27, mtl_init, bind to socket 1, numa_nodes 2 MTL: 2024-10-08 11:13:27, Warn: mt_instance_init, connect to manager fail, assume single instance mode MTL: 2024-10-08 11:13:27, mtl_init(0), socket_id 1 port 0000:84:00.0 MTL: 2024-10-08 11:13:27, stat_thread, start MTL: 2024-10-08 11:13:27, mt_stat_init, stat period 10s MTL: 2024-10-08 11:13:27, mt_dev_if_init(0), use mt ptp source MTL: 2024-10-08 11:13:27, mt_dev_if_init(0), user request queues tx 0 rx 1 MTL: 2024-10-08 11:13:27, mt_dev_if_init(0), deprecated sessions tx 0 rx 0 MTL: 2024-10-08 11:13:27, Warn: dev_config_port(0), failed to setup all ptype, only 0 supported MTL: 2024-10-08 11:13:27, dev_config_port(0), tx_q(1 with 512 desc) rx_q (2 with 2048 desc) MTL: 2024-10-08 11:13:27, mt_mempool_create_by_ops(1), succ at 0x3200cfdf00 size 4.310394m n 2047 d 2048 for T_P0_SYS_0 MTL: 2024-10-08 11:13:27, dev_if_init_tx_queues(0), tx_queues 1 malloc succ MTL: 2024-10-08 11:13:27, mt_mempool_create_by_ops(1), succ at 0x3200e1da80 size 8.622894m n 4095 d 2048 for R_P0Q0_MBUF_1 MTL: 2024-10-08 11:13:27, mt_mempool_create_by_ops(1), succ at 0x3201efdf00 size 6.623383m n 4095 d 1536 for R_P0Q1_MBUF_2 MTL: 2024-10-08 11:13:27, dev_if_init_rx_queues(0), rx_queues 2 malloc succ MTL: 2024-10-08 11:13:27, mt_dev_if_init(0), port_id 0 port_type 2 drv_type 9 MTL: 2024-10-08 11:13:27, mt_dev_if_init(0), dev_capa 0x14, offload 0xd96af:0x19621f queue offload 0x0:0x19601f, rss : 0xf00000000803afbc MTL: 2024-10-08 11:13:27, mt_dev_if_init(0), system_rx_queues_end 1 hdr_split_rx_queues_end 1 MTL: 2024-10-08 11:13:27, mt_dev_if_init(0), sip: 192.168.85.80 MTL: 2024-10-08 11:13:27, mt_dev_if_init(0), netmask: 255.255.255.0 MTL: 2024-10-08 11:13:27, mt_dev_if_init(0), gateway: 0.0.0.0 MTL: 2024-10-08 11:13:27, mt_dev_if_init(0), mac: 0c:42:a1:72:c8:a2 MTL: 2024-10-08 11:13:27, dev_start_port(0), rx_defer 0 MTL: 2024-10-08 11:13:27, mt_eth_link_dump(0), link_speed 100g link_status 1 link_duplex 1 link_autoneg 1 MTL: 2024-10-08 11:13:27, dev_if_init_pacing(0), use tsc as default MTL: 2024-10-08 11:13:27, mt_dev_create(0), feature 0x70, tx pacing tsc MTL: 2024-10-08 11:13:27, sch_lcore_shm_init, clear shm as we are the first user MTL: 2024-10-08 11:13:27, sch_lcore_shm_init, shared memory attached at 0x7fad1d895000 nattch 1 shm_id 2 key 0x15050005 MTL: 2024-10-08 11:13:27, mt_sch_mrg_init, succ with data quota 31068 M MTL: 2024-10-08 11:13:27, sch_request(0), name sch_0 with 16 tasklets, type 0 socket 1 MTL: 2024-10-08 11:13:27, mt_sch_add_quota(0:0), quota 0 total now 0 MTL: 2024-10-08 11:13:27, mt_dev_get_tx_queue(0), q 0 without rl MTL: 2024-10-08 11:13:27, mt_mcast_init, report every 10 seconds MTL: 2024-10-08 11:13:27, mt_dev_get_rx_queue(0), q 0 ip 0.0.0.0 port 0 MTL: 2024-10-08 11:13:27, cni_queues_init(0), rxq 0 MTL: 2024-10-08 11:13:27, cni_traffic_thread, start MTL: 2024-10-08 11:13:27, admin_thread, start MTL: 2024-10-08 11:13:27, st_plugins_init, succ MTL: 2024-10-08 11:13:27, config_parse_json, parse kahawai.json with json-c version: 0.16 MTL: 2024-10-08 11:13:27, st22_decoder_register(0), st22_decoder_sample registered, device 1 cap(0x300000000000000:0x70000402b) MTL: 2024-10-08 11:13:27, st22_encoder_register(0), st22_encoder_sample registered, device 1 cap(0x70000402b:0x300000000000000) st_plugin_create, succ with st22 sample plugin MTL: 2024-10-08 11:13:27, st_plugin_register(0), /usr/local/lib/x86_64-linux-gnu/libst_plugin_st22_sample.so registered, version 1 MTL: 2024-10-08 11:13:27, Warn: st_plugin_register, dlopen /usr/local/lib64/libst_plugin_st22_sample.so fail MTL: 2024-10-08 11:13:27, mt_ptp_port_id(0), port_number: 0000, clk_id: 0c:42:a1:ff:fe:72:c8:a2 MTL: 2024-10-08 11:13:27, mt_main_create, succ MTL: 2024-10-08 11:13:28, mt_calibrate_tsc, tscHz 2100013779 MTL: 2024-10-08 11:13:28, mt_sch_get_lcore, succ on shm lcore 8 for lib_sch socket 1 MTL: 2024-10-08 11:13:28, sch_start(0), succ on lcore 8 socket 1 MTL: 2024-10-08 11:13:28, mt_dev_start, succ MTL: 2024-10-08 11:13:28, _mt_start, succ, avail ports 1 MTL: 2024-10-08 11:13:28, mtl_init, succ, tsc_hz 2100013779 MTL: 2024-10-08 11:13:28, sch_tasklet_func(0), start with 0 tasklets, t_pid 22293 MTL: 2024-10-08 11:13:28, mtl_init, simd level avx2, flags 0x20001 Runtime error: zeInit(ZE_INIT_FLAG_GPU_ONLY) returned 2013265921 at ../gpu.c:42Initialization error: init_level_zero_lib() returned -1 at ../gpu.c:117 main, app gpu initialization failed -1

chenzuozhou commented 6 days ago

The error code 2013265921(0x78000001) means driver is not initialized, I have already install the driver and driver version is "22.24.23453"

chenzuozhou commented 5 days ago

I think I make a mistake before, compute-runtimeis only for intel gpu. For nvidia gpu, we need use DPC++ to Target NVIDIA GPUs. Does there is any examples for SYCL C++ language extension.