anholt / linux

Other
134 stars 24 forks source link

vc4: RPi3 whole system crash (webgl, or resizing terminal) #21

Open randyoo opened 8 years ago

randyoo commented 8 years ago

By visiting the following page using Chromium (Version 48.0.2564.82 Built on Ubuntu 15.04, running on Raspbian 8.0, with hardware acceleration for WebGL), and setting the renderer option to "WebGL", my entire system completely froze 2 out of 3 times: http://brm.io/matter-js/demo/

Unfortunately, there's nothing in /var/log/kern.log from the crash itself, although it's full of messages like the following:

Mar 12 10:28:17 pi3 kernel: [ 1157.017293] vc4-drm soc:gpu@7e4c0000: failed to allocate buffer with size 3956736
Mar 12 10:28:17 pi3 kernel: [ 1157.017346] vc4-drm soc:gpu@7e4c0000: failed to allocate buffer with size 3956736
Mar 12 10:28:17 pi3 kernel: [ 1157.018377] [drm] Resetting GPU.
Mar 12 10:28:17 pi3 kernel: [ 1157.022468] vc4-drm soc:gpu@7e4c0000: failed to allocate buffer with size 3956736
Mar 12 10:28:17 pi3 kernel: [ 1157.022817] vc4-drm soc:gpu@7e4c0000: failed to allocate buffer with size 3956736
Mar 12 10:28:17 pi3 kernel: [ 1157.027353] vc4-drm soc:gpu@7e4c0000: failed to allocate buffer with size 3956736
<snip>
Mar 12 11:04:34 pi3 kernel: [  183.699052] vc4-drm soc:gpu@7e4c0000: failed to allocate buffer with size 4857856
Mar 12 11:04:34 pi3 kernel: [  183.699676] vc4-drm soc:gpu@7e4c0000: failed to allocate buffer with size 4857856
Mar 12 11:04:34 pi3 kernel: [  183.699840] vc4-drm soc:gpu@7e4c0000: failed to allocate buffer with size 4857856
Mar 12 11:04:34 pi3 kernel: [  183.699901] vc4-drm soc:gpu@7e4c0000: failed to allocate buffer with size 4857856
Mar 12 11:04:34 pi3 kernel: [  183.699924] [drm:vc4_bo_create [vc4]] *ERROR* Failed to allocate from CMA:
Mar 12 11:04:34 pi3 kernel: [  183.699928] [drm] num bos allocated: 322
Mar 12 11:04:34 pi3 kernel: [  183.699933] [drm] size bos allocated: 197112kb
Mar 12 11:04:34 pi3 kernel: [  183.699937] [drm] num bos used: 320
Mar 12 11:04:34 pi3 kernel: [  183.699941] [drm] size bos used: 187624kb
Mar 12 11:04:34 pi3 kernel: [  183.699945] [drm] num bos cached: 2
Mar 12 11:04:34 pi3 kernel: [  183.699948] [drm] size bos cached: 9488kb
anholt commented 8 years ago

If you get a GPU hang before the "ERROR Failed to allocate from CMA", it might be useful to get a GPU hang dump from it (https://github.com/anholt/vc4-gpu-tools). If it only hangs after the OOM errors, then we probably need to debug memory usage.

randyoo commented 8 years ago

It seems to be hanging only after the "failure to allocate" errors. Actually just had another instance where, on a fresh-booted system, with >500MB free RAM, I got similar errors in the kern.log file, just by re-sizing a Terminal window:

Mar 14 20:40:43 pi3 kernel: [   95.233915] vc4-drm soc:gpu@7e4c0000: failed to allocate buffer with size 1089536
Mar 14 20:40:45 pi3 kernel: [   97.049983] [drm] Resetting GPU.
Mar 14 20:40:47 pi3 kernel: [   99.050061] [drm] Resetting GPU.
Mar 14 20:40:49 pi3 kernel: [  101.050085] [drm] Resetting GPU.
Mar 14 20:40:51 pi3 kernel: [  103.050109] [drm] Resetting GPU.
Mar 14 20:40:53 pi3 kernel: [  105.050122] [drm] Resetting GPU.
Mar 14 20:40:55 pi3 kernel: [  107.058299] [drm] Resetting GPU.
Mar 14 20:41:18 pi3 kernel: [  130.762654] vc4-drm soc:gpu@7e4c0000: failed to allocate buffer with size 1089536
Mar 14 20:41:18 pi3 kernel: [  130.865088] vc4-drm soc:gpu@7e4c0000: failed to allocate buffer with size 1056768
Mar 14 20:41:18 pi3 kernel: [  130.868299] vc4-drm soc:gpu@7e4c0000: failed to allocate buffer with size 1056768
Mar 14 20:41:18 pi3 kernel: [  130.868357] vc4-drm soc:gpu@7e4c0000: failed to allocate buffer with size 1056768
Mar 14 20:41:18 pi3 kernel: [  130.932494] [drm:vc4_validate_bin_cl [vc4]] *ERROR* 0x00000000: packet 112 (VC4_PACKET_TILE_BINNING_MODE_CONFIG)
anholt commented 8 years ago

If the "failed to allocate" wasn't followed by someone else complaining about allocation failure, then usually a cache got cleared and we managed to allocate.

randyoo commented 8 years ago

Sorry, I shouldn't have left out this detail, but in that previous comment, the last line was followed by a complete system crash, including a line of gibberish in the kernel log.

If there's something I need to do to help debug memory use, let me know. Seems really easy to reproduce--just resizing a Terminal window consistently fills the logs with these kinds of errors, causes >10 second system freezes, sometimes followed by a complete crash.

ulgena commented 8 years ago

Greetings, I can reproduce this problem by using non-updated clean image of Raspbian Jessie (2016-05-27, which is installed by NOOBS 1.9.2)

The firefox/firefox-esr software causes the below outputs, while epiphany-browser doesn't. I do not even use CMA within config.txt

root@raspberrypi:~# grep -Ev '^#|^$' /boot/config.txt
disable_overscan=1
framebuffer_width=1920
framebuffer_height=1080
dtparam=audio=on
hdmi_force_hotplug=1
dtoverlay=vc4-kms-v3d
gpu_mem=256
root@raspberrypi:~# vcgencmd get_config int
arm_freq=1200
audio_pwm_mode=1
config_hdmi_boost=5
core_freq=400
desired_osc_freq=0x36ee80
disable_commandline_tags=2
disable_l2cache=1
disable_splash=1
force_eeprom_read=1
force_pwm_open=1
framebuffer_height=1080
framebuffer_ignore_alpha=1
framebuffer_swap=1
framebuffer_width=1920
gpu_freq=300
hdmi_force_cec_address=65535
hdmi_force_hotplug=1
init_uart_clock=0x2dc6c00
lcd_framerate=60
mask_gpu_interrupt0=1024
mask_gpu_interrupt1=26370
over_voltage_avs=0x19f0a
pause_burst_frames=1
program_serial_random=1
sdram_freq=450
second_boot=1
temp_limit=85
root@raspberrypi:~# vcgencmd get_config str
device_tree=-
root@raspberrypi:~#

dmesg - vc4 and drm related boot/startup outputs and kernel commandline

root@raspberrypi:~# dmesg | grep -E 'drm|vc'
[    0.000000] Kernel command line: 8250.nr_uarts=0 cma=256M@256M dma.dmachans=0x7f35 bcm2708_fb.fbwidth=1920 bcm2708_fb.fbheight=1080 bcm2709.boardrev=0xa02082 bcm2709.serial=0x3747200a smsc95xx.macaddr=B8:27:EB:47:20:0A bcm2708_fb.fbswap=1 bcm2709.uart_clock=48000000 vc_mem.mem_base=0x3dc00000 vc_mem.mem_size=0x3f000000  dwc_otg.lpm_enable=0 console=ttyS0,115200 console=tty1 root=/dev/mmcblk0p7 rootfstype=ext4 elevator=deadline fsck.repair=yes rootwait quiet acpi=off
[    1.273678] vc-cma: Videocore CMA driver
[    1.273689] vc-cma: vc_cma_base      = 0x00000000
[    1.273699] vc-cma: vc_cma_size      = 0x00000000 (0 MiB)
[    1.273707] vc-cma: vc_cma_initial   = 0x00000000 (0 MiB)
[    1.273931] vc-mem: phys_addr:0x00000000 mem_base=0x3dc00000 mem_size:0x3f000000(1008 MiB)
[    1.298735] vchiq: vchiq_init_state: slot_zero = 0x90400000, is_master = 0
[    1.841870] vc-sm: Videocore shared memory driver
[    1.841884] [vc_sm_connected_init]: start
[    1.842340] [vc_sm_connected_init]: end - returning 0
[    5.466545] [drm] Initialized drm 1.1.0 20060810
[    5.553105] vc4-drm soc:gpu: bound 3f902000.hdmi (ops vc4_hdmi_ops [vc4])
[    5.558622] vc4-drm soc:gpu: bound 3f206000.pixelvalve (ops vc4_crtc_ops [vc4])
[    5.558898] vc4-drm soc:gpu: bound 3f207000.pixelvalve (ops vc4_crtc_ops [vc4])
[    5.559097] vc4-drm soc:gpu: bound 3f807000.pixelvalve (ops vc4_crtc_ops [vc4])
[    5.559182] vc4-drm soc:gpu: bound 3f400000.hvs (ops vc4_hvs_ops [vc4])
[    5.560703] vc4-drm soc:gpu: bound 3fc00000.v3d (ops vc4_v3d_ops [vc4])
[    5.565043] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    5.565063] [drm] No driver support for vblank timestamp query.
[    5.665355] vc4-drm soc:gpu: fb0:  frame buffer device
[    9.446963] [drm:drm_edid_block_valid [drm]] *ERROR* EDID checksum is invalid, remainder is 82
[    9.483603] [drm:drm_edid_block_valid [drm]] *ERROR* EDID checksum is invalid, remainder is 25
root@raspberrypi:~#

dmesg - error related output

[  625.909314] vc4-drm soc:gpu: failed to allocate buffer with size 1077248
[  625.911516] [drm:vc4_validate_bin_cl [vc4]] *ERROR* 0x00000000: packet 112 (VC4_PACKET_TILE_BINNING_MODE_CONFIG) failed to validate
[  625.912705] vc4-drm soc:gpu: failed to allocate buffer with size 3317760
[  625.912784] vc4-drm soc:gpu: failed to allocate buffer with size 3317760
.... SAME OUTPUTS
[  626.042469] vc4-drm soc:gpu: failed to allocate buffer with size 1069056
[  626.042762] vc4-drm soc:gpu: failed to allocate buffer with size 1069056
[  626.044928] [drm:vc4_validate_bin_cl [vc4]] *ERROR* 0x00000000: packet 112 (VC4_PACKET_TILE_BINNING_MODE_CONFIG) failed to validate
[  626.046490] vc4-drm soc:gpu: failed to allocate buffer with size 1089536
[  626.046957] vc4-drm soc:gpu: failed to allocate buffer with size 1089536
[  628.001887] [drm] Resetting GPU.
[  630.001909] [drm] Resetting GPU.
.... SAME OUTPUTS
[  709.002437] [drm] Resetting GPU.
[  710.002465] [drm] Resetting GPU.
[  712.142637] [drm:vc4_validate_bin_cl [vc4]] *ERROR* 0x00000000: packet 112 (VC4_PACKET_TILE_BINNING_MODE_CONFIG) failed to validate
[  713.002425] [drm] Resetting GPU.
[  714.002439] [drm] Resetting GPU.
.... AND KEEP GOING UNTIL I RESET RPi3

firefox 's stderr

Performance warning: Async animation disabled because frame size (26600, 670) is bigger than the viewport (1620, 911) or the visual rectangle (26600, 670) is larger than the max allowable value (17895698) [ul]
Draw call returned Invalid argument.  Expect corruption.

In another tryout i received below kernel panic besides the same outputs above and same hang situation

Message from syslogd@raspberrypi at Oct 12 18:00:38 ...
 kernel:[  174.066702] Internal error: Oops: 5 [#1] SMP ARM

Addition to that i also realized the following outputs by Xorg log file:

(EE) glamor0: GL error: FBO incomplete: driver marked FBO as incomplete [-1]
(EE) glamor0: GL error: FBO incomplete: driver marked FBO as incomplete [-1]
lromor commented 8 years ago

Hello I'm also having this error on Xorg Logs:

(EE) glamor0: GL error: FBO incomplete: driver marked FBO as incomplete -1 glamor0: GL error: FBO incomplete: driver marked FBO as incomplete [-1]

anholt commented 8 years ago

@lromor That's not an error, please ignore it.

anholt commented 7 years ago

Hopefully https://github.com/raspberrypi/linux/pull/1835 fixes a bunch of instability around CMA OOMs. Could you test if you're still having trouble with that?