Open kraj opened 2 weeks ago
Hi @kraj
Which branch you are using scarthgap
or master
?
Are you using QT6 or QT5 ?
Hi @kraj Which branch you are using
scarthgap
ormaster
? Are you using QT6 or QT5 ?
QT6 and all layers at master
branches.
Thanks. I will try to reproduce the bug and will get back to you ASAP
btw. I am not using wayland or X11, its using eglfs to launch the browser.
I see. I didn't try the L4T R36.3.0 with eglfs. Maybe @kekiefer tried that already. @kekiefer any thoughts about the issue reported here ?
The problem looks similar to problems in the past - the power management-related services have to start in a specific order, and before anything else touches the GPU, or you get these kinds of tracebacks.
@kraj Are you using sysvinit or systemd as your init manager?
I don't have a working r36 system with graphics quite yet, but for unrelated reasons, so I can't say whether I would run into this issue or not.
I can say that I didn't run into this onto a system with qt and the eglfs gbm backend on r35.
Which qt eglfs backend is being used? For the gbm, since this looks like an allocation problem, you could explore using tegra-udrm-gbm
as the rprovider for tegra-gbm-backend
: https://github.com/OE4T/meta-tegra/blob/master/recipes-graphics/mesa/tegra-udrm-gbm_1.1.0.bb
Edit: looks like Matt's message crossed with my own, his answer sounds more promising, but I'll leave mine here for posterity
The problem looks similar to problems in the past - the power management-related services have to start in a specific order, and before anything else touches the GPU, or you get these kinds of tracebacks.
@kraj Are you using sysvinit or systemd as your init manager?
I am using systemd and I have looked at another issue where you have fixed some sequencing of services, those changes are in master already. However, I do see this
root@jetson-agx-orin-devkit:~# systemctl status nvpower.service
● nvpower.service - NVIDIA power management setup
Loaded: loaded (/usr/lib/systemd/system/nvpower.service; enabled; preset: enabled)
Active: active (exited) since Wed 2024-07-10 16:36:34 UTC; 2h 38min ago
Process: 856 ExecStart=/usr/libexec/nvpower.sh (code=exited, status=0/SUCCESS)
Main PID: 856 (code=exited, status=0/SUCCESS)
CPU: 86ms
Jul 10 16:36:34 jetson-agx-orin-devkit systemd[1]: Starting NVIDIA power management setup...
Jul 10 16:36:34 jetson-agx-orin-devkit nvpower.sh[856]: /usr/libexec/nvpower.sh: line 473: /sys/class/hwmon/hwmon3/in1_label: No such…irectory
Jul 10 16:36:34 jetson-agx-orin-devkit systemd[1]: Finished NVIDIA power management setup.
Hint: Some lines were ellipsized, use -l to show in full.
and
root@jetson-agx-orin-devkit:~# ls -l /sys/class/hwmon/hwmon3/in*
-rw-r--r-- 1 root root 4096 Jul 10 16:54 /sys/class/hwmon/hwmon3/in1_enable
-r--r--r-- 1 root root 4096 Jul 10 16:54 /sys/class/hwmon/hwmon3/in1_input
-rw-r--r-- 1 root root 4096 Jul 10 16:54 /sys/class/hwmon/hwmon3/in2_enable
-r--r--r-- 1 root root 4096 Jul 10 16:36 /sys/class/hwmon/hwmon3/in2_input
-r--r--r-- 1 root root 4096 Jul 10 16:36 /sys/class/hwmon/hwmon3/in2_label
-rw-r--r-- 1 root root 4096 Jul 10 16:54 /sys/class/hwmon/hwmon3/in3_enable
-r--r--r-- 1 root root 4096 Jul 10 16:54 /sys/class/hwmon/hwmon3/in3_input
-r--r--r-- 1 root root 4096 Jul 10 16:54 /sys/class/hwmon/hwmon3/in4_input
-r--r--r-- 1 root root 4096 Jul 10 16:54 /sys/class/hwmon/hwmon3/in5_input
-r--r--r-- 1 root root 4096 Jul 10 16:54 /sys/class/hwmon/hwmon3/in6_input
-r--r--r-- 1 root root 4096 Jul 10 16:54 /sys/class/hwmon/hwmon3/in7_input
-r--r--r-- 1 root root 4096 Jul 10 16:54 /sys/class/hwmon/hwmon3/in7_label
not sure how /sys/class/hwmon/hwmon3/in1_label
is created. Its missing on my device.
I don't have a working r36 system with graphics quite yet, but for unrelated reasons, so I can't say whether I would run into this issue or not.
I can say that I didn't run into this onto a system with qt and the eglfs gbm backend on r35.
Which qt eglfs backend is being used? For the gbm, since this looks like an allocation problem, you could explore using
tegra-udrm-gbm
as the rprovider fortegra-gbm-backend
: https://github.com/OE4T/meta-tegra/blob/master/recipes-graphics/mesa/tegra-udrm-gbm_1.1.0.bb
❯ bitbake-getvar -r qtbase QT_QPA_DEFAULT_EGLFS_INTEGRATION
WARNING: Published ports are discarded when using host network mode
#
# $QT_QPA_DEFAULT_EGLFS_INTEGRATION
# set? /mnt/b/yoe/master/sources/meta-tegra/external/qt6-layer/recipes-qt/qt6/qtbase_%.bbappend:3
# "${@bb.utils.contains('PREFERRED_RPROVIDER_tegra-gbm-backend', 'tegra-libraries-gbm-backend', 'eglfs_kms_egldevice', 'eglfs_kms', d)}"
QT_QPA_DEFAULT_EGLFS_INTEGRATION="eglfs_kms_egldevice"
for using tegra-udrm-gbm
does it need wayland ?
Edit: looks like Matt's message crossed with my own, his answer sounds more promising, but I'll leave mine here for posterity
for using tegra-udrm-gbm does it need wayland ?
No, this is a mesa gbm backend for use with drm/kms
for using tegra-udrm-gbm does it need wayland ?
No, this is a mesa gbm backend for use with drm/kms
using tegra-udrm-gbm
does not work either.
Jul 10 23:17:45 jetson-agx-orin-devkit yoe-kiosk-browser[1022]: MESA-LOADER: failed to open tegra: /usr/lib/dri/tegra_dri.so: cannot open shared object file: No such file or directory (search paths /usr/lib/dri, suffix _dri)
Jul 10 23:17:45 jetson-agx-orin-devkit yoe-kiosk-browser[1022]: MESA-LOADER: failed to open kms_swrast: /usr/lib/dri/kms_swrast_dri.so: cannot open shared object file: No such file or directory (search paths /usr/lib/dri, suffix _dri)
Jul 10 23:17:45 jetson-agx-orin-devkit yoe-kiosk-browser[1022]: MESA-LOADER: failed to open swrast: /usr/lib/dri/swrast_dri.so: cannot open shared object file: No such file or directory (search paths /usr/lib/dri, suffix _dri)
Jul 10 23:17:45 jetson-agx-orin-devkit yoe-kiosk-browser[1022]: Could not create GBM device (No such file or directory)
Jul 10 23:17:45 jetson-agx-orin-devkit yoe-kiosk-browser[1022]: Could not open DRM device
The nvgpu crashed reported above still remain.
One more thought: Weston normally pulls in the nvidia-drm-loadconf
recipe, but you won't have that here with a kms-only implementation that doesn't include weston, but you'll need it for kms to work properly at the end of the day.
for using tegra-udrm-gbm does it need wayland ?
No, this is a mesa gbm backend for use with drm/kms
using
tegra-udrm-gbm
does not work either.Jul 10 23:17:45 jetson-agx-orin-devkit yoe-kiosk-browser[1022]: MESA-LOADER: failed to open tegra: /usr/lib/dri/tegra_dri.so: cannot open shared object file: No such file or directory (search paths /usr/lib/dri, suffix _dri) Jul 10 23:17:45 jetson-agx-orin-devkit yoe-kiosk-browser[1022]: MESA-LOADER: failed to open kms_swrast: /usr/lib/dri/kms_swrast_dri.so: cannot open shared object file: No such file or directory (search paths /usr/lib/dri, suffix _dri) Jul 10 23:17:45 jetson-agx-orin-devkit yoe-kiosk-browser[1022]: MESA-LOADER: failed to open swrast: /usr/lib/dri/swrast_dri.so: cannot open shared object file: No such file or directory (search paths /usr/lib/dri, suffix _dri) Jul 10 23:17:45 jetson-agx-orin-devkit yoe-kiosk-browser[1022]: Could not create GBM device (No such file or directory) Jul 10 23:17:45 jetson-agx-orin-devkit yoe-kiosk-browser[1022]: Could not open DRM device
The nvgpu crashed reported above still remain.
then I added mesa-megadriver
to image which left me with enabling the tegra gallium driver in mesa once that was added as a packageconfig all needed pre-requisites were available to run,
root@jetson-agx-orin-devkit:~# ls -l /usr/lib/dri/
-rwxr-xr-x 1 root root 15216240 May 8 14:27 kms_swrast_dri.so
-rwxr-xr-x 5 root root 81222560 May 8 14:27 nouveau_dri.so
-rwxr-xr-x 5 root root 81222560 May 8 14:27 swrast_dri.so
-rwxr-xr-x 5 root root 81222560 May 8 14:27 tegra_dri.so
-rwxr-xr-x 5 root root 81222560 May 8 14:27 virtio_gpu_dri.so
-rwxr-xr-x 5 root root 81222560 May 8 14:27 zink_dri.so
sadly, it gets over the above problem. but fails with
Jul 10 23:59:56 jetson-agx-orin-devkit yoe-kiosk-browser[1328]: libnvrm_gpu.so: NvRmGpuLibOpen failed, error=4
Jul 10 23:59:56 jetson-agx-orin-devkit yoe-kiosk-browser[1328]: Could not open egl display
Mesa should not be falling back on any DRI driver to initialize drm -- the drm implementation is provided by nvidia's libdrm, and mesa only gets used for buffer allocation in gbm.
Your last message looks suspicious though, can you verify that the nvidia-drm kernel driver is loaded with the option modeset=1
?
One more thought: Weston normally pulls in the
nvidia-drm-loadconf
recipe, but you won't have that here with a kms-only implementation that doesn't include weston, but you'll need it for kms to work properly at the end of the day.
I built nvidia-drm-loadconf
and installed the ipk did not help much.
Jul 11 00:16:42 jetson-agx-orin-devkit yoe-kiosk-browser[1157]: libnvrm_gpu.so: NvRmGpuLibOpen failed, error=4
Jul 11 00:16:42 jetson-agx-orin-devkit yoe-kiosk-browser[1157]: Could not open egl display
That recipe is supposed to load nvidia-drm with the option modeset=1
if you haven't restarted, you could do that, or you could try manually unloading and reloading this module with that option.
That recipe is supposed to load nvidia-drm with the option
modeset=1
if you haven't restarted, you could do that, or you could try manually unloading and reloading this module with that option.
right, I have rebooted.
root@jetson-agx-orin-devkit:~# cat /etc/modprobe.d/nvidia-drm.conf
options nvidia-drm modeset=1
Do you have a /dev/dri/by-path/platform-13800000.display-card
?
Do you have a
/dev/dri/by-path/platform-13800000.display-card
?
yep
root@jetson-agx-orin-devkit:~# ls -l /dev/dri/by-path/platform-13800000.display-card
lrwxrwxrwx 1 root root 8 May 9 08:07 /dev/dri/by-path/platform-13800000.display-card -> ../card1
Set up a file /usr/share/tegra.conf or something like that with these contents:
{
"device": "/dev/dri/by-path/platform-13800000.display-card"
}
Then export these variables before launching your qt application:
export QT_QPA_PLATFORM=eglfs
export QT_QPA_EGLFS_KMS_CONFIG=/usr/share/tegra.conf
Set up a file /usr/share/tegra.conf or something like that with these contents:
{ "device": "/dev/dri/by-path/platform-13800000.display-card" }
Then export these variables before launching your qt application:
export QT_QPA_PLATFORM=eglfs export QT_QPA_EGLFS_KMS_CONFIG=/usr/share/tegra.conf
its already doing this in .service file
...
Environment=QT_QPA_EGLFS_INTEGRATION=eglfs_kms
Environment=QT_QPA_EGLFS_KMS_CONFIG=/etc/default/eglfs.json
and /etc/default/eglfs.json is
root@jetson-agx-orin-devkit:~# cat /etc/default/eglfs.json
{
"device": "/dev/dri/by-path/platform-13800000.display-card",
"hwcursor": false,
"pbuffers": true,
"outputs": [
{
"name": "LVDS-1",
"mode": "1024x600"
}
]
}
root@jetson-agx-orin-devkit:~#
I guess it's also worth asking if you're still seeing the dma allocation failures from the gpu in the kernel logs, because if you are, maybe this is all just a tangent.
I guess it's also worth asking if you're still seeing the dma allocation failures from the gpu in the kernel logs, because if you are, maybe this is all just a tangent.
yes I am seeing those messages consistently, as mentioned.
root@jetson-agx-orin-devkit:~# systemctl --failed
UNIT LOAD ACTIVE SUB DESCRIPTION
● nvpmodel.service loaded failed failed NVIDIA power model daemon
● systemd-networkd-wait-online.service loaded failed failed Wait for Network to be Configured
● yoe-kiosk-browser.service loaded failed failed Yoe Kiosk Browser
snippet of journal where this is seen whenever the yoe-kiosk-browser service or nvpmodel.service is restarted/started
Jul 11 00:48:59 jetson-agx-orin-devkit audit: BPF prog-id=47 op=UNLOAD
Jul 11 00:48:59 jetson-agx-orin-devkit audit: BPF prog-id=46 op=UNLOAD
Jul 11 00:48:59 jetson-agx-orin-devkit systemd[1]: yoe-kiosk-browser.service: Scheduled restart job, restart counter is at 4.
Jul 11 00:48:59 jetson-agx-orin-devkit systemd[1]: Started Yoe Kiosk Browser.
Jul 11 00:48:59 jetson-agx-orin-devkit yoe-kiosk-browser[1262]: YOE_KIOSK_BROWSER_URL= "http://localhost:8118"
Jul 11 00:48:59 jetson-agx-orin-devkit yoe-kiosk-browser[1262]: YOE_KIOSK_BROWSER_EXCEPTION_URL= "@EXCEPTION_URL@"
Jul 11 00:48:59 jetson-agx-orin-devkit yoe-kiosk-browser[1262]: YOE_KIOSK_BROWSER_ROTATE= "0"
Jul 11 00:48:59 jetson-agx-orin-devkit yoe-kiosk-browser[1262]: YOE_KIOSK_BROWSER_KEYBOARD_SCALE= "1"
Jul 11 00:48:59 jetson-agx-orin-devkit yoe-kiosk-browser[1262]: YOE_KIOSK_BROWSER_RETRY_INTERVAL= "10"
Jul 11 00:48:59 jetson-agx-orin-devkit kernel: tegra-hsierrrptinj: Callback for 0x0001 already registered
Jul 11 00:48:59 jetson-agx-orin-devkit kernel: nvgpu: 17000000.gpu nvgpu_cic_mon_init:75 [ERR] Err inj callback registration failed: -22
Jul 11 00:48:59 jetson-agx-orin-devkit kernel[854]: [ 282.985455] tegra-hsierrrptinj: Callback for 0x0001 already registered
Jul 11 00:48:59 jetson-agx-orin-devkit kernel[854]: [ 282.985466] nvgpu: 17000000.gpu nvgpu_cic_mon_init:75 [ERR] Err inj callback registration fai
Jul 11 00:48:59 jetson-agx-orin-devkit kernel: nvgpu: 17000000.gpu engine_fb_queue_set_element_use_state:144 [ERR] FBQ last received queue element not processed yet
Jul 11 00:48:59 jetson-agx-orin-devkit kernel: nvgpu: 17000000.gpu nvgpu_engine_fb_queue_push:373 [ERR] fb-queue element in use map is in invalid state
Jul 11 00:48:59 jetson-agx-orin-devkit kernel: nvgpu: 17000000.gpu nvgpu_engine_fb_queue_push:401 [ERR] falcon id-0, queue id-1, failed
Jul 11 00:48:59 jetson-agx-orin-devkit kernel: nvgpu: 17000000.gpu pmu_write_cmd:178 [ERR] fail to write cmd to queue 1
Jul 11 00:48:59 jetson-agx-orin-devkit kernel: nvgpu: 17000000.gpu nvgpu_pmu_rpc_execute:727 [ERR] Failed to execute RPC status=0xffffffea, func=0x3
Jul 11 00:48:59 jetson-agx-orin-devkit kernel: nvgpu: 17000000.gpu gv100_pmu_lsfm_bootstrap_ls_falcon:100 [ERR] Failed to execute RPC, status=0xffffffea
Jul 11 00:48:59 jetson-agx-orin-devkit kernel: nvgpu: 17000000.gpu nvgpu_pmu_lsfm_bootstrap_ls_falcon:128 [ERR] LSF Load failed
Jul 11 00:48:59 jetson-agx-orin-devkit kernel: nvgpu: 17000000.gpu nvgpu_gr_falcon_load_secure_ctxsw_ucode:718 [ERR] Unable to recover GR falcon
Jul 11 00:48:59 jetson-agx-orin-devkit kernel: nvgpu: 17000000.gpu nvgpu_gr_falcon_init_ctxsw:156 [ERR] fail
Jul 11 00:48:59 jetson-agx-orin-devkit kernel: nvgpu: 17000000.gpu nvgpu_cic_mon_report_err_safety_services:97 [ERR] Error reporting is not supported in this platform
Jul 11 00:48:59 jetson-agx-orin-devkit kernel: nvgpu: 17000000.gpu gr_init_ctxsw_falcon_support:857 [ERR] FECS context switch init error
Jul 11 00:48:59 jetson-agx-orin-devkit kernel: nvgpu: 17000000.gpu nvgpu_finalize_poweron:1095 [ERR] Failed initialization for: g->ops.gr.gr_init_support
Jul 11 00:48:59 jetson-agx-orin-devkit kernel[854]: [ 283.032139] nvgpu: 17000000.gpu engine_fb_queue_set_element_use_state:144 [ERR] FBQ last received queue element not processed yet queue_pos 0
Jul 11 00:48:59 jetson-agx-orin-devkit kernel[854]: [ 283.032152] nvgpu: 17000000.gpu nvgpu_engine_fb_queue_push:373 [ERR] fb-queue element in use map is in invalid state
Jul 11 00:48:59 jetson-agx-orin-devkit kernel[854]: [ 283.032157] nvgpu: 17000000.gpu nvgpu_engine_fb_queue_push:401 [ERR] falcon id-0, queue id-1, failed
Jul 11 00:48:59 jetson-agx-orin-devkit kernel[854]: [ 283.032162] nvgpu: 17000000.gpu pmu_write_cmd:178 [ERR] fail to write cmd to queue 1
Jul 11 00:48:59 jetson-agx-orin-devkit kernel[854]: [ 283.032167] nvgpu: 17000000.gpu nvgpu_pmu_rpc_execute:727 [ERR] Failed to execute RPC status=0xffffffea, func=0x3
Jul 11 00:48:59 jetson-agx-orin-devkit kernel[854]: [ 283.032171] nvgpu: 17000000.gpu gv100_pmu_lsfm_bootstrap_ls_falcon:100 [ERR] Failed to execute RPC, status=0xffffffea
Jul 11 00:48:59 jetson-agx-orin-devkit kernel[854]: [ 283.032174] nvgpu: 17000000.gpu nvgpu_pmu_lsfm_bootstrap_ls_falcon:128 [ERR] LSF Load failed
Jul 11 00:48:59 jetson-agx-orin-devkit kernel[854]: [ 283.032179] nvgpu: 17000000.gpu nvgpu_gr_falcon_load_secure_ctxsw_ucode:718 [ERR] Unable to recover GR falcon
Jul 11 00:48:59 jetson-agx-orin-devkit kernel[854]: [ 283.032182] nvgpu: 17000000.gpu nvgpu_gr_falcon_init_ctxsw:156 [ERR] fail
Jul 11 00:48:59 jetson-agx-orin-devkit kernel[854]: [ 283.032190] nvgpu: 17000000.gpu nvgpu_cic_mon_report_err_safety_services:97 [ERR] Error reporting is not supported in this platform
Jul 11 00:48:59 jetson-agx-orin-devkit kernel[854]: [ 283.032194] nvgpu: 17000000.gpu gr_init_ctxsw_falcon_support:857 [ERR] FECS context switch init error
Jul 11 00:48:59 jetson-agx-orin-devkit kernel[854]: [ 283.032198] nvgpu: 17000000.gpu nvgpu_finalize_poweron:1095 [ERR] Failed initialization for: g->ops.gr.gr_init_support
Jul 11 00:48:59 jetson-agx-orin-devkit yoe-kiosk-browser[1262]: libnvrm_gpu.so: NvRmGpuLibOpen failed, error=4
Jul 11 00:48:59 jetson-agx-orin-devkit yoe-kiosk-browser[1262]: Could not open egl display
Jul 11 00:48:59 jetson-agx-orin-devkit audit[1262]: ANOM_ABEND auid=4294967295 uid=0 gid=0 ses=4294967295 subj=kernel pid=1262 comm="yoe-kiosk-brows" exe="/usr/bin/yoe-kiosk-browser" sig=6 res=1
Jul 11 00:48:59 jetson-agx-orin-devkit kernel: nvgpu: 17000000.gpu gk20a_power_write:127 [ERR] power_node_write failed at busy
Jul 11 00:48:59 jetson-agx-orin-devkit kernel[854]: [ 283.056999] nvgpu: 17000000.gpu gk20a_power_write:127 [ERR] power_node_write failed at busy
Jul 11 00:48:59 jetson-agx-orin-devkit audit: BPF prog-id=49 op=LOAD
Jul 11 00:48:59 jetson-agx-orin-devkit audit: BPF prog-id=50 op=LOAD
Jul 11 00:48:59 jetson-agx-orin-devkit audit: BPF prog-id=51 op=LOAD
Jul 11 00:48:59 jetson-agx-orin-devkit systemd[1]: Started Process Core Dump (PID 1265/UID 0).
Jul 11 00:48:59 jetson-agx-orin-devkit systemd-coredump[1266]: elfutils disabled, parsing ELF objects not supported
Jul 11 00:48:59 jetson-agx-orin-devkit systemd-coredump[1266]: [🡕] Process 1262 (yoe-kiosk-brows) of user 0 dumped core.
Jul 11 00:48:59 jetson-agx-orin-devkit systemd[1]: systemd-coredump@9-1265-0.service: Deactivated successfully.
Jul 11 00:48:59 jetson-agx-orin-devkit systemd[1]: yoe-kiosk-browser.service: Main process exited, code=dumped, status=6/ABRT
Jul 11 00:48:59 jetson-agx-orin-devkit systemd[1]: yoe-kiosk-browser.service: Failed with result 'core-dump'.
Jul 11 00:49:00 jetson-agx-orin-devkit audit: BPF prog-id=51 op=UNLOAD
Jul 11 00:49:00 jetson-agx-orin-devkit audit: BPF prog-id=50 op=UNLOAD
Jul 11 00:49:00 jetson-agx-orin-devkit audit: BPF prog-id=49 op=UNLOAD
Jul 11 00:49:00 jetson-agx-orin-devkit systemd[1]: yoe-kiosk-browser.service: Scheduled restart job, restart counter is at 5.
Jul 11 00:49:00 jetson-agx-orin-devkit systemd[1]: yoe-kiosk-browser.service: Start request repeated too quickly.
Jul 11 00:49:00 jetson-agx-orin-devkit systemd[1]: yoe-kiosk-browser.service: Failed with result 'core-dump'.
Jul 11 00:49:00 jetson-agx-orin-devkit systemd[1]: Failed to start Yoe Kiosk Browser.
@ichergui can you check on an orin devkit if the /sys/class/hwmon/ nodes have a setup like @kraj sees? The one it's bailing out on looks like it is expected to be a ina3221
which I don't have on my custom board.
For lack of anything better to try, it looks like the devkit sets up one of the ina3221's like this:
/hardware/nvidia/t23x/nv-public/nv-platform/tegra234-p3701-0000.dtsi
/ {
bus@0 {
i2c@c240000 {
...
ina3221@41 {
compatible = "ti,ina3221";
reg = <0x41>;
#address-cells = <1>;
#size-cells = <0>;
channel@0 {
reg = <0x0>;
status = "disabled";
};
channel@1 {
reg = <0x1>;
label = "VDDQ_VDD2_1V8AO";
shunt-resistor-micro-ohms = <2000>;
};
channel@2 {
reg = <0x2>;
status = "disabled";
};
};
...
And that could explain the missing label nodes. You could perhaps try removing this i2c device before loading the nvpower.sh? Or patch the devicetree, this is built by the recipe nvidia-kernel-oot
I tracked down an orin devkit and installed demo-image-egl
from our tegra demo distro, with a local modification setting PREFERRED_RPROVIDER_tegra-gbm-backend = "tegra-udrm-gbm"
.
It looks like the warning from nvpower.sh is "normal"
Next I installed kmscube and nvidia-drm-loadconf, and rebooted. On the next boot, the nvidia-drm module was not loaded automatically, this is a problem.
After modprobe nvidia-drm modeset=1
, I get one warning from the kernel:
[ 62.470329] nvidia-modeset: Loading NVIDIA UNIX Open Kernel Mode Setting Driver for aarch64 540.3.0 Release Build (@duckhawk) Thu 11 Jul 2024 07:07:12 PM UTC
[ 62.477555] [drm] [nvidia-drm] [GPU ID 0x00020000] Loading driver
[ 62.992520] NVRM nvAssertFailedNoLog: Assertion failed: minRequiredIsoBandwidthKBPS <= clientBwValues[DISPLAY_ICC_BW_CLIENT_EXT].minRequiredIsoBandwidthKBPS @ kern_disp_0402.c:111
[ 62.992532] CPU: 2 PID: 1115 Comm: kworker/u24:6 Tainted: G O 5.15.136-l4t-r36.3-1009.9+g46cdb595bebc #1
[ 62.992537] Hardware name: NVIDIA NVIDIA Jetson AGX Orin Developer Kit/Jetson, BIOS v36.3.0 01/08/2024
[ 62.992539] Workqueue: dce-async-ipc-wq dce_client_async_event_work [tegra_dce]
[ 62.992563] Call trace:
[ 62.992564] dump_backtrace+0x0/0x1e0
[ 62.992577] show_stack+0x34/0x44
[ 62.992582] dump_stack_lvl+0x68/0x84
[ 62.992588] dump_stack+0x18/0x34
[ 62.992590] os_dump_stack+0x1c/0x28 [nvidia]
[ 62.992721] nvAssertFailedBacktrace.part.0+0x80/0xa0 [nvidia]
[ 62.992830] kdispArbAndAllocDisplayBandwidth_v04_02+0x240/0x260 [nvidia]
[ 62.992938] kdispInvokeDisplayModesetCallback_KERNEL+0xa8/0x100 [nvidia]
[ 62.993042] osTegraDceClientIpcCallback+0x84/0xc0 [nvidia]
[ 62.993147] dce_client_async_event_work+0x90/0x18c [tegra_dce]
[ 62.993156] process_one_work+0x208/0x4e0
[ 62.993164] worker_thread+0x74/0x4a0
[ 62.993167] kthread+0x180/0x198
[ 62.993172] ret_from_fork+0x10/0x20
[ 63.116412] [drm] Initialized nvidia-drm 0.0.0 20160202 for 13800000.display on minor 1
But still, after that, running kmscube works. I do not see any of the other aforementioned kernel traces from the gpu.
I noticed that we were looking at the nvpower.service earlier, but you have a failure loading nvpmodel.service. Maybe we should proceed by looking into this.
root@jetson-agx-orin-devkit:~# systemctl status nvpmodel.service
● nvpmodel.service - NVIDIA power model daemon
Loaded: loaded (/usr/lib/systemd/system/nvpmodel.service; enabled; preset: enabled)
Active: active (exited) since Thu 2024-07-11 20:27:25 UTC; 5min ago
Process: 953 ExecStart=/usr/sbin/nvpmodel -f /etc/nvpmodel.conf (code=exited, status=0/SUCCES>
Main PID: 953 (code=exited, status=0/SUCCESS)
CPU: 96ms
Jul 11 20:27:23 jetson-agx-orin-devkit systemd[1]: Starting NVIDIA power model daemon...
Jul 11 20:27:25 jetson-agx-orin-devkit systemd[1]: Finished NVIDIA power model daemon.
Regarding nvidia-drm-loadconf
, it turns out it's working fine. I installed a version from the wrong package feed that was missing the modules-load.d entry. Installing the correct version fixed that. The rest of my comment stands.
@kekiefer I tried to disable the i2c device in DT, it does not help, the second part is about failing nvpmodel.service
and here is the service error msg
root@jetson-agx-orin-devkit:~# systemctl status nvpmodel
× nvpmodel.service - NVIDIA power model daemon
Loaded: loaded (/usr/lib/systemd/system/nvpmodel.service; enabled; preset: enabled)
Active: failed (Result: exit-code) since Tue 2024-06-11 21:42:11 UTC; 4 weeks 2 days ago
Invocation: 55eef1780ca04d6b8790bd0d7f694df1
Process: 1049 ExecStart=/usr/sbin/nvpmodel -f /etc/nvpmodel.conf (code=exited, status=255/EXCEPTION)
Main PID: 1049 (code=exited, status=255/EXCEPTION)
Jun 11 21:42:07 jetson-agx-orin-devkit systemd[1]: Starting NVIDIA power model daemon...
Jun 11 21:42:11 jetson-agx-orin-devkit nvpmodel[1049]: NVPM ERROR: Error opening /sys/devices/platform/17000000.gpu/devfreq_dev/available_frequencies: 2
Jun 11 21:42:11 jetson-agx-orin-devkit nvpmodel[1049]: NVPM ERROR: failed to read PARAM GPU: ARG FREQ_TABLE: PATH /sys/devices/platform/17000000.gpu/devfr…frequencies
Jun 11 21:42:11 jetson-agx-orin-devkit nvpmodel[1049]: NVPM ERROR: failed to set power mode!
Jun 11 21:42:11 jetson-agx-orin-devkit nvpmodel[1049]: NVPM ERROR: optMask is 2, no request for power mode
Jun 11 21:42:11 jetson-agx-orin-devkit systemd[1]: nvpmodel.service: Main process exited, code=exited, status=255/EXCEPTION
Jun 11 21:42:11 jetson-agx-orin-devkit systemd[1]: nvpmodel.service: Failed with result 'exit-code'.
Jun 11 21:42:11 jetson-agx-orin-devkit systemd[1]: Failed to start NVIDIA power model daemon.
Not sure why /sys/devices/platform/17000000.gpu/devfreq_dev/available_frequencies
is missing infact devfreq_dev
directory itself is missing.
yoe-kiosk-browser is failing too and following messages in journal are appearing which might be of interest
Jul 12 04:26:04 jetson-agx-orin-devkit yoe-kiosk-browser[1321]: libnvrm_gpu.so: NvRmGpuLibOpen failed, error=4
Jul 12 04:26:04 jetson-agx-orin-devkit yoe-kiosk-browser[1321]: eglQueryDevicesEXT could not find any EGL devices
Jul 12 04:26:04 jetson-agx-orin-devkit yoe-kiosk-browser[1321]: Could not set up EGL device!
btw. also seeing that 4 cores are marked offline
, do you see this as well ?
Yes, this is part of the default 30W power model, unless changed with nvpmodel
userspace or via NVPMODEL_CONFIG_DEFAULT
in the build.
gk20a_scale_init
in nvgpu/drivers/gpu/nvgpu/os/linux/scale.c from the nvidia-kernel-oot
package seems to be what is responsible for setting up that devfreq_governor
node. There are quite a few paths through here which can result in it not getting set up, so it's not clear yet what might be falling down.
CONFIG_GK20A_DEVFREQ
seems to be not set in .config
could that be an issue
That was my initial thought, but actually CONFIG_GK20A_DEVFREQ
is just part of the out-of-tree configuration and should be enabled when the kernel has CONFIG_COMMON_CLK
and CONFIG_PM_DEVFREQ
.
Are you installing all kernel modules on the target? From both the out-of-tree collection and the kernel?
Seems to be ok.
here is my lsmod
root@jetson-agx-orin-devkit:~# lsmod
Module Size Used by
nvvrs_pseq_rtc 16384 0
rtk_btusb 77824 0
snd_hda_codec_hdmi 69632 1
mttcan 69632 0
bluetooth 458752 2 rtk_btusb
tegra23x_perf_uncore 24576 0
nvethernet 1179648 0
snd_hda_tegra 16384 0
can_dev 40960 1 mttcan
ecdh_generic 16384 1 bluetooth
ecc 36864 1 ecdh_generic
tegra234_aon 57344 1
tegra_mce 28672 1 tegra23x_perf_uncore
snd_hda_codec 139264 2 snd_hda_codec_hdmi,snd_hda_tegra
nvpmodel_clk_cap 16384 0
thermal_trip_event 16384 0
tegra234_oc_event 16384 0
nvpps 32768 2 mttcan,nvethernet
rtl8822ce 3362816 0
tegra_cactmon_mc_all 16384 0
snd_hda_core 102400 3 snd_hda_codec_hdmi,snd_hda_codec,snd_hda_tegra
nvidia 1626112 0
pwm_tegra_tachometer 16384 0
at24 24576 0
spi_tegra114 28672 0
i2c_nvvrs11 16384 0
pwm_tegra 20480 1
lm90 28672 0
nvidia_vrs_pseq 16384 0
host1x_fence 20480 0
tegra_bpmp_thermal 16384 0
tegra_dce 110592 2 nvidia
mc_hwpm 16384 0
nvhost_isp5 16384 0
nvhost_vi5 20480 0
nvhost_nvcsi_t194 16384 0
tegra_camera 245760 3 nvhost_isp5,nvhost_nvcsi_t194,nvhost_vi5
v4l2_dv_timings 36864 1 tegra_camera
v4l2_fwnode 20480 1 tegra_camera
v4l2_async 24576 2 v4l2_fwnode,tegra_camera
videobuf2_dma_contig 24576 1 tegra_camera
videobuf2_memops 20480 1 videobuf2_dma_contig
nvhost_nvcsi 24576 1 tegra_camera
tegra_camera_platform 24576 4 nvhost_isp5,nvhost_nvcsi_t194,tegra_camera,nvhost_vi5
capture_ivc 28672 1 tegra_camera
cfg80211 856064 1 rtl8822ce
rfkill 36864 4 bluetooth,cfg80211
governor_userspace 16384 0
tegra_camera_rtcpu 229376 2 capture_ivc,tegra_camera
ivc_bus 24576 2 capture_ivc,tegra_camera_rtcpu
hsp_mailbox_client 20480 2 ivc_bus,tegra_camera_rtcpu
ivc_ext 20480 2 ivc_bus,capture_ivc
videobuf2_v4l2 32768 1 tegra_camera
tegra_drm 372736 0
videobuf2_common 65536 4 videobuf2_dma_contig,videobuf2_v4l2,tegra_camera,videobuf2_memops
nvhost_pva 167936 0
nvhost_nvdla 110592 0
tegra_wmark 16384 0
videodev 266240 4 v4l2_async,videobuf2_v4l2,tegra_camera,videobuf2_common
mc 61440 4 videodev,videobuf2_v4l2,tegra_camera,videobuf2_common
nvhost_capture 20480 2 nvhost_isp5,nvhost_vi5
nvhwpm 139264 4 mc_hwpm,tegra_drm,nvhost_nvdla,nvhost_pva
tegra_se 57344 0
cec 57344 1 tegra_drm
crypto_engine 20480 1 tegra_se
tsecriscv 32768 1 nvidia
host1x_nvhost 40960 9 nvhost_isp5,nvhost_nvcsi_t194,nvidia,tegra_camera,nvhost_nvdla,nvhost_capture,nvhost_nvcsi,nvhost_pva,nvhost_vi5
drm_kms_helper 303104 1 tegra_drm
pwm_fan 20480 0
nvgpu 2793472 0
governor_pod_scaling 45056 0
nvmap 237568 1 nvgpu
nvsciipc 24576 1 nvmap
host1x 208896 7 host1x_nvhost,host1x_fence,tegra_se,nvgpu,tegra_drm,nvhost_nvdla,nvhost_pva
mc_utils 16384 3 nvidia,nvgpu,tegra_camera_platform
ina3221 24576 0
drm 630784 4 drm_kms_helper,nvidia,tegra_drm
ipv6 503808 62
nvme 49152 0
nvme_core 106496 1 nvme
tegra_xudc 45056 0
ucsi_ccg 28672 0
typec_ucsi 36864 1 ucsi_ccg
typec 61440 1 typec_ucsi
pcie_tegra194 40960 0
phy_tegra194_p2u 16384 13
installed mods
root@jetson-agx-orin-devkit:~# opkg list-installed | grep nvidia-kernel
nvidia-kernel-oot-base - 36.3.0-r0.1.0
nvidia-kernel-oot-cameras - 36.3.0-r0.1.0
nvidia-kernel-oot-canbus - 36.3.0-r0.1.0
nvidia-kernel-oot-display - 36.3.0-r0.1.0
nvidia-kernel-oot-wifi - 36.3.0-r0.1.0
root@jetson-agx-orin-devkit:~# opkg list-installed | grep nv-kernel
nv-kernel-module-ar1335-common-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-arm64-ras-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-bmi088-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-cam-cdi-tsc-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-cam-fsync-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-camchar-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-capture-ivc-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-cdi-dev-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-cdi-gpio-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-cdi-mgr-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-cdi-pwm-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-cpuidle-debugfs-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-cpuidle-tegra-auto-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-fusb301-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-governor-pod-scaling-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-host1x-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-host1x-fence-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-host1x-nvhost-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-hsp-mailbox-client-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-i2c-nvvrs11-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-isc-dev-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-isc-gpio-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-isc-mgr-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-isc-pwm-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-ivc-bus-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-ivc-cdev-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-ivc-ext-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-lt6911uxc-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-max9295-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-max9296-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-max96712-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-maxim-gmsl-dp-serializer-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-maxim-gmsl-hdmi-serializer-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-mc-hwpm-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-mc-utils-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-mttcan-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-nv-ar0234-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-nv-hawk-owl-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-nv-imx185-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-nv-imx219-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-nv-imx274-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-nv-imx318-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-nv-imx390-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-nv-imx477-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-nv-ov5693-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-nvethernet-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-nvgpu-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-nvhost-capture-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-nvhost-isp5-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-nvhost-nvcsi-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-nvhost-nvcsi-t194-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-nvhost-nvdla-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-nvhost-pva-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-nvhost-vi-tpg-t19x-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-nvhost-vi5-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-nvhwpm-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-nvidia-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-nvidia-drm-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-nvidia-modeset-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-nvidia-vrs-pseq-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-nvmap-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-nvpmodel-clk-cap-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-nvpps-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-nvsciipc-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-nvvrs-pseq-rtc-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-pca9570-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-pinctrl-tegra194-pexclk-padctrl-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-pinctrl-tegra234-dpaux-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-pwm-tegra-tachometer-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-r8168-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-rtk-btusb-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-rtl8822ce-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-spi-tegra210-quad-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-tegra-aon-ivc-echo-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-tegra-bpmp-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-tegra-cactmon-mc-all-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-tegra-camera-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-tegra-camera-platform-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-tegra-camera-rtcpu-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-tegra-dce-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-tegra-drm-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-tegra-mce-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-tegra-se-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-tegra-se-nvrng-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-tegra-wmark-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-tegra234-aon-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-tegra234-oc-event-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-tegra23x-perf-uncore-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-tegra23x-psc-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-thermal-trip-event-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-tsecriscv-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-ufs-tegra-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-ufs-tegra-provision-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-virtual-i2c-mux-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
nv-kernel-module-watchdog-tegra-t18x-5.15.136-l4t-r36.3-1009.9+g46cdb595bebc - 36.3.0-r0.1.0
The problems loading the power management bits early on still seem to be likely at the root of this, causing later issues dealing with the gpu. I've got the nvidia_modeset
and nvidia_drm
modules loaded, but it looks like you have these installed, so I can only guess that they're failing to load because of the first problem.
root@jetson-agx-orin-devkit:~# lsmod
Module Size Used by
bridge 266240 0
stp 20480 1 bridge
llc 20480 2 bridge,stp
usb_f_ecm 24576 2
usb_f_acm 16384 2
u_serial 20480 3 usb_f_acm
usb_f_rndis 32768 2
u_ether 28672 2 usb_f_rndis,usb_f_ecm
libcomposite 65536 14 usb_f_rndis,usb_f_ecm,usb_f_acm
rtk_btusb 77824 0
bluetooth 458752 22 rtk_btusb
ecdh_generic 16384 1 bluetooth
ecc 36864 1 ecdh_generic
rtl8822ce 3362816 0
nvethernet 1179648 0
snd_hda_codec_hdmi 69632 1
mttcan 69632 0
tegra_cactmon_mc_all 16384 0
can_dev 40960 1 mttcan
cfg80211 856064 1 rtl8822ce
tegra234_aon 57344 1
at24 24576 0
nvpps 32768 2 mttcan,nvethernet
rfkill 36864 6 bluetooth,cfg80211
snd_hda_tegra 16384 0
snd_hda_codec 139264 2 snd_hda_codec_hdmi,snd_hda_tegra
snd_hda_core 102400 3 snd_hda_codec_hdmi,snd_hda_codec,snd_hda_tegra
host1x_fence 20480 0
pwm_tegra_tachometer 16384 0
spi_tegra114 28672 0
pwm_tegra 20480 1
mc_hwpm 16384 0
nvhost_vi5 20480 0
nvhost_isp5 16384 0
nvhost_nvcsi_t194 16384 0
nvvrs_pseq_rtc 16384 0
tegra_camera 245760 3 nvhost_isp5,nvhost_nvcsi_t194,nvhost_vi5
v4l2_dv_timings 36864 1 tegra_camera
v4l2_fwnode 20480 1 tegra_camera
v4l2_async 24576 2 v4l2_fwnode,tegra_camera
videobuf2_dma_contig 24576 1 tegra_camera
videobuf2_memops 20480 1 videobuf2_dma_contig
nvhost_nvcsi 24576 1 tegra_camera
lm90 28672 0
i2c_nvvrs11 16384 0
nvidia_vrs_pseq 16384 0
snd_soc_tegra_machine_driver 16384 0
tegra_bpmp_thermal 16384 0
capture_ivc 28672 1 tegra_camera
snd_soc_tegra_utils 32768 1 snd_soc_tegra_machine_driver
snd_soc_simple_card_utils 28672 1 snd_soc_tegra_utils
tegra_camera_platform 24576 4 nvhost_isp5,nvhost_nvcsi_t194,tegra_camera,nvhost_vi5
tegra234_oc_event 16384 0
tegra23x_perf_uncore 24576 0
tegra_mce 28672 1 tegra23x_perf_uncore
nvpmodel_clk_cap 16384 0
thermal_trip_event 16384 0
tegra_camera_rtcpu 229376 2 capture_ivc,tegra_camera
ivc_bus 24576 2 capture_ivc,tegra_camera_rtcpu
hsp_mailbox_client 20480 2 ivc_bus,tegra_camera_rtcpu
ivc_ext 20480 2 ivc_bus,capture_ivc
videobuf2_v4l2 32768 1 tegra_camera
pwm_fan 20480 0
videobuf2_common 65536 4 videobuf2_dma_contig,videobuf2_v4l2,tegra_camera,videobuf2_memops
videodev 266240 4 v4l2_async,videobuf2_v4l2,tegra_camera,videobuf2_common
mc 61440 4 videodev,videobuf2_v4l2,tegra_camera,videobuf2_common
tegra_se 57344 0
nvhost_pva 167936 0
crypto_engine 20480 1 tegra_se
nvhost_capture 20480 2 nvhost_isp5,nvhost_vi5
nvhost_nvdla 110592 0
nvidia_drm 90112 0
governor_userspace 16384 0
tegra_drm 372736 0
tegra_wmark 16384 0
nvhwpm 139264 4 mc_hwpm,tegra_drm,nvhost_nvdla,nvhost_pva
cec 57344 1 tegra_drm
nvidia_modeset 1310720 1 nvidia_drm
nvidia 1626112 1 nvidia_modeset
tegra_dce 110592 2 nvidia
tsecriscv 32768 1 nvidia
drm_kms_helper 303104 2 tegra_drm,nvidia_drm
host1x_nvhost 40960 10 nvhost_isp5,nvhost_nvcsi_t194,nvidia,tegra_camera,nvhost_nvdla,nvhost_capture,nvhost_nvcsi,nvhost_pva,nvhost_vi5,nvidia_modeset
nvgpu 2793472 0
governor_pod_scaling 45056 0
nvmap 237568 1 nvgpu
nvsciipc 24576 1 nvmap
host1x 208896 9 host1x_nvhost,host1x_fence,tegra_se,nvgpu,tegra_drm,nvhost_nvdla,nvidia_drm,nvhost_pva,nvidia_modeset
mc_utils 16384 3 nvidia,nvgpu,tegra_camera_platform
ina3221 24576 0
fuse 139264 1
drm 630784 5 drm_kms_helper,nvidia,tegra_drm,nvidia_drm
ipv6 503808 55 bridge
nvme 49152 0
nvme_core 106496 1 nvme
tegra_xudc 45056 0
ucsi_ccg 28672 0
typec_ucsi 36864 1 ucsi_ccg
typec 61440 1 typec_ucsi
pcie_tegra194 40960 0
phy_tegra194_p2u 16384 13
For what it's worth, these are the git hashes I used to build the tegrademo demo-image-egl, where it is working on an orin devkit for me:
meta = "HEAD:0df57c2c739c09f6c128515e03f0c2c8758ef905"
meta-tegra = "master:2c972e80d9715fd22022e1d95c8b4c192b7b1f7a"
meta-oe
meta-python
meta-networking
meta-filesystems = "HEAD:9363162b5147e2ecc21796047aefc7a10e0d999a"
meta-qt6 = "dev:bdc5526f0ea5fc79c05dc26ebb0d6ab4f42b484a"
meta-virtualization = "HEAD:e96da98e4038f5388596b4294ac3d8425b2dacb2"
meta-tegra-community = "HEAD:84ef4249ae938c9065811e2c242655471dcc4bdf"
meta-tegra-support
meta-demo-ci
meta-tegrademo
yoe-kiosk-browser ( which is based on qtwebengine ) gets a SIGSEGV and I could fathom the backtrace now.
(gdb) bt
#0 0x0000ffff7d43149c in ?? () from /usr/lib/gbm/tegra_gbm.so
#1 0x0000ffff7d431754 in ?? () from /usr/lib/gbm/tegra_gbm.so
#2 0x0000ffff7d452d8c in backend_create_device (bd=0xaaaab94e7f20, fd=5) at /usr/src/debug/mesa/24.0.7/src/gbm/main/backend.c:105
#3 load_backend (lib=0xaaaab94e8250, fd=fd@entry=5, name=0xaaaab94e3ea0 "tegra") at /usr/src/debug/mesa/24.0.7/src/gbm/main/backend.c:137
#4 0x0000ffff7d452ff0 [PAC] in backend_from_driver_name (fd=5) at /usr/src/debug/mesa/24.0.7/src/gbm/main/backend.c:211
#5 _gbm_create_device (fd=fd@entry=5) at /usr/src/debug/mesa/24.0.7/src/gbm/main/backend.c:226
#6 0x0000ffff7d4530d4 [PAC] in gbm_create_device (fd=5) at /usr/src/debug/mesa/24.0.7/src/gbm/main/gbm.c:138
#7 0x0000ffff7d510fa0 [PAC] in QEglFSKmsGbmDevice::open (this=0xaaaab94d9990) at /usr/src/debug/qtbase/6.7.3/src/plugins/platforms/eglfs/deviceintegration/eglfs_kms/qeglfskmsgbmdevice.cpp:40
#8 0x0000ffff7d48a02c [PAC] in QEglFSKmsIntegration::platformInit (this=0xaaaab94e3e80) at /usr/src/debug/qtbase/6.7.3/src/plugins/platforms/eglfs/deviceintegration/eglfs_kms_support/qeglfskmsintegration.cpp:37
#9 0x0000ffff7dfcc728 [PAC] in QEglFSIntegration::initialize (this=0xaaaab94d2e40) at /usr/src/debug/qtbase/6.7.3/src/plugins/platforms/eglfs/api/qeglfsintegration.cpp:87
#10 0x0000ffff81b9c938 [PAC] in QCoreApplicationPrivate::init (this=this@entry=0xaaaab94d34f0) at /usr/src/debug/qtbase/6.7.3/src/corelib/kernel/qcoreapplication.cpp:914
#11 0x0000ffff822ab68c [PAC] in QGuiApplicationPrivate::init (this=0xaaaab94d34f0) at /usr/src/debug/qtbase/6.7.3/src/gui/kernel/qguiapplication.cpp:1585
#12 0x0000ffff822acd9c [PAC] in QGuiApplication::QGuiApplication (this=this@entry=0xffffedfdd700, argc=@0xffffedfdd6ec: 1, argv=argv@entry=0xffffedfdd9e8) at /usr/src/debug/qtbase/6.7.3/src/gui/kernel/qguiapplication.h:172
#13 0x0000aaaab0f32718 [PAC] in main (argc=<optimized out>, argv=0xffffedfdd9e8) at /usr/src/debug/yoe-kiosk-browser/1.0.0+git/main.cpp:64
(gdb)
I really think you need to solve the prior problems setting up power management for the gpu in the kernel before diving into the details of the graphics stack.
One note though - without nvidia_modeset and nvidia_drm, you won't be able to load a graphics device with gbm.
yeah, I was putting it here for reference, to see if the path for a "eglfs" based image was still ok or is it using wrong libraries etc.
One note though - without nvidia_modeset and nvidia_drm, you won't be able to load a graphics device with gbm.
can you share your kernel .config
so I can compare to mine so see any difference.
Regarding nvidia_drm (from the oot modules recipe), it looks like you have it installed, but it wasn't loaded in your printout of lsmod, despite installing nvidia-drm-loadconf
. There are no kernel dependencies on this module -- does the module autoload work in your distro, or are you using an older version of this package that didn't install the modules-load.d entry? You can always manually run modprobe nvidia-drm modeset=1
.
hmm nvidia-drm-loadconf
ipk is missing on rootfs. but I built is separately now for tests and installed it. now
root@jetson-agx-orin-devkit:~# opkg files nvidia-drm-loadconf
Package nvidia-drm-loadconf (1.0-r0.7) is installed on root and has the following files:
/etc/modules-load.d/nvidia-drm.conf
/etc/modprobe.d/nvidia-drm.conf
/etc
/etc/modules-load.d
/etc/modprobe.d
and on reboot I do see these modules
root@jetson-agx-orin-devkit:~# lsmod | grep nvidia_[dm+]
nvidia_drm 90112 0
nvidia_modeset 1310720 1 nvidia_drm
nvidia 1626112 1 nvidia_modeset
drm_kms_helper 303104 2 tegra_drm,nvidia_drm
host1x_nvhost 40960 10 nvhost_isp5,nvhost_nvcsi_t194,nvidia,tegra_camera,nvhost_nvdla,nvhost_capture,nvhost_nvcsi,nvhost_pva,nvhost_vi5,nvidia_modeset
host1x 208896 9 host1x_nvhost,host1x_fence,tegra_se,nvgpu,tegra_drm,nvhost_nvdla,nvidia_drm,nvhost_pva,nvidia_modeset
drm 630784 5 drm_kms_helper,nvidia,tegra_drm,nvidia_drm
The SEGV seen before remains as it is.
comparing .config
there are no differences that would matter
❯ diff .config /tmp/linux-jammy-nvidia-tegra-dot-config.txt -u
--- .config 2024-07-11 23:20:19.665470855 -0700
+++ /tmp/linux-jammy-nvidia-tegra-dot-config.txt 2024-07-12 10:42:05.133743503 -0700
@@ -2,7 +2,7 @@
# Automatically generated file; DO NOT EDIT.
# Linux/arm64 5.15.136 Kernel Configuration
#
-CONFIG_CC_VERSION_TEXT="aarch64-yoe-linux-gcc (GCC) 14.1.0"
+CONFIG_CC_VERSION_TEXT="aarch64-oe4t-linux-gcc (GCC) 14.1.0"
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=140100
CONFIG_CLANG_VERSION=0
here is my dmesg logs dmesg.txt
It seems that nvgpu messages are keys as they happen with yoe-kiosk-brower
as well as nvpmodel
service
Here are a journal and dmesg from a run where I interactively log in and run kmscube. The kernel logs look substantially the same on quick review, up until the errors, so maybe there are some clues in the journal? Does it make a difference if you delay starting yoe-kiosk-browser until much later?
From the dmesg log, it looks like you have a 64GiB module (P3701-0005) in there, rather than the 32GiB one (P3701-0000). You'll need to use MACHINE = "p3737-0000-p3701-0005"
for that hardware.
From the dmesg log, it looks like you have a 64GiB module (P3701-0005) in there, rather than the 32GiB one (P3701-0000). You'll need to use
MACHINE = "p3737-0000-p3701-0005"
for that hardware.
ha! that could be root of all. I must say the machine names are a bit confusing and I got tripped. If there is some way to name them so they are more revealing would be good. Are the SKU numbers in some form readable from machine via some NVRAM read etc ?
From the dmesg log, it looks like you have a 64GiB module (P3701-0005) in there, rather than the 32GiB one (P3701-0000). You'll need to use
MACHINE = "p3737-0000-p3701-0005"
for that hardware.
Thanks a lot @madisongh this really helped and nailed the problem. Second minor issue was that I have to use /dev/dri/card1
instead of /dev/dri/card0
which yoe-kiosk-browser's default is. Now I can launch the browser on EGL surface. Onto doing some openCV and test some CUDA accelarations.
Are the SKU numbers in some form readable from machine via some NVRAM read etc ?
The full part number is stored in an EEPROM on the module. The setup-nv-boot-control
recipe installs a script for reading that info and programming a couple of EFI variables from it. It's also read by the flashing tools/scripts.
I must say the machine names are a bit confusing and I got tripped.
Yep, that's a problem, and it's worse now than with earlier L4T versions due to device trees being different between SKUs in the same family. It's less of a problem for NVIDIA, since everything's pre-built, and their flashing scripts read the module info before constructing the rootfs, so they can get away with using the same config name for all of the variants. That's harder for us, since we have to know some of these differences at build time.
Still, I think there's something we could do to at least catch these mismatches earlier in the process.
Describe the bug nvpmodel.service fails to start and any other services needing OpenGL/EGL also do not start
To Reproduce
Build QTWebengine for MACHINE=jetson-agx-orin-devkit
Additional context
crash report as seen on console.