Hexxeh / rpi-firmware

Firmware files for the Raspberry Pi
Other
775 stars 208 forks source link

All video decoding randomly stops working. #249

Open Botspot opened 3 years ago

Botspot commented 3 years ago

This has occurred on my Raspberry Pi OS system for about a month now.

Once in a while, something happens that causes the video decoded to stop working. This "trigger" can happen at any time it seems, but is more likely to happen if I do something weird with VLC or Chromium. Like playing a video and closing vlc while it's playing.

I've not found a fool-proof way to activate the "trigger" yet, but messing around with hw-accelerated video players seems to sometimes cause it.

After this trigger is activated, no more videos will play on my Pi until I reboot it.
Youtube videos never load in Chromium. It shows the spinning white circle forever.
Also, no videos play in VLC anymore. VLC forever thinks that the video is loading (with that orange slider bounding left and right), but the video never ever starts playing. I've tried various video codecs on VLC, and all of the have the same result.

And as I said, rebooting "fixes" all these problems, Youtube works again, vlc works again, until the next time that "trigger" is triggered. :)

Next time this "trigger" happens, I'll try to capture error logs from Chromium, vlc, and omxplayer. Is there anything else specific I should try for diagnosing what's causing the issue?

https://github.com/RPi-Distro/vlc/issues/31 may be referring to the same problem I'm experiencing.

pelwell commented 3 years ago

You should run sudo vcdbg log msg to look for messages from the firmware, and vcgencmd version to verify the firmware version number and confirm that the firmware is still running.

Botspot commented 3 years ago
006003.032: bfs_xhci_retrieve 3ee69c80
006003.043: bfs_xhci_retrieve VL805 flags 00000001
006200.119: brfs: File read: /mfs/sd/config.txt
006200.476: brfs: File read: 243 bytes
006289.437: brfs: File read: /mfs/sd/config.txt
006304.446: brfs: File read: 243 bytes
006751.032: gpioman: gpioman_get_pin_num: pin DISPLAY_DSI_PORT not defined
006753.491: *** Restart logging
006794.813: hdmi: HDMI:hdmi_get_state is deprecated, use hdmi_get_display_state instead
006794.829: HDMI0: hdmi_pixel_encoding: 300000000
006794.842: HDMI1: hdmi_pixel_encoding: 300000000
006799.789: dtb_file 'bcm2711-rpi-4-b.dtb'
006807.249: brfs: File read: /mfs/sd/bcm2711-rpi-4-b.dtb
006807.264: Loading 'bcm2711-rpi-4-b.dtb' to 0x100 size 0xb99c
006819.378: brfs: File read: 47516 bytes
006827.399: brfs: File read: /mfs/sd/overlays/overlay_map.dtb
006882.009: brfs: File read: 1463 bytes
006883.314: brfs: File read: /mfs/sd/config.txt
006883.364: dtparam: audio=on
006892.618: brfs: File read: 243 bytes
006903.670: brfs: File read: /mfs/sd/overlays/vc4-fkms-v3d.dtbo
006920.334: Loaded overlay 'vc4-fkms-v3d'
006959.216: brfs: File read: 1446 bytes
006960.446: brfs: File read: /mfs/sd/cmdline.txt
006960.492: Read command line from file 'cmdline.txt':
006960.507: 'usb-storage.quirks=174c:1153:u console=serial0,115200 console=tty1 root=PARTUUID=4b002859-02 rootfstype=ext4 elevator=deadline fsck.repair=yes rootwait'
007844.462: brfs: File read: 153 bytes
008906.124: brfs: File read: /mfs/sd/kernel8.img
008906.165: Loading 'kernel8.img' to 0x80000 size 0xf4b200
008906.219: Device tree loaded to 0x2eff4100 (size 0xbefe)
008914.763: bfs_xhci_stop
008914.777: XHCI-STOP
008914.959: xHC ver: 256 HCS: 05000420 fc000031 00e70004 HCC: 002841eb
008917.064: PCI reset
010762.334: vchiq_core: vchiq_init_state: slot_zero = 0xded00000, is_master = 1
010766.058: hdmi: HDMI:hdmi_get_state is deprecated, use hdmi_get_display_state instead
010768.609: TV service:host side not connected, dropping notification 0x00000002, 0x00000001, 0x00000010
014884.346: TV service:host side not connected, dropping notification 0x00000008, 0x00000004, 0x00000000
086361.732: mmal: mmal_ril_set_port_settings: ril.isp:in:0(OPQV): failed to set port definition (4) buffers 0/0/0
086361.759: mmal: mmal_ril_port_send: format not set on port 3f01b660, ril.isp:in:0(OPQV)
224758.195: mmal: mmal_ril_set_port_settings: ril.isp:in:0(OPQV): failed to set port definition (4) buffers 0/0/0
224758.224: mmal: mmal_ril_port_send: format not set on port 3f01b660, ril.isp:in:0(OPQV)
244024.564: mmal: mmal_ril_set_port_settings: ril.isp:in:0(OPQV): failed to set port definition (4) buffers 0/0/0
244024.598: mmal: mmal_ril_port_send: format not set on port 3f01b660, ril.isp:in:0(OPQV)
409384.269: mmal: mmal_ril_set_port_settings: ril.isp:in:0(OPQV): failed to set port definition (4) buffers 0/0/0
409384.332: mmal: mmal_ril_port_send: format not set on port 3f01b660, ril.isp:in:0(OPQV)
942088.303: mmal: mmal_ril_set_port_settings: ril.isp:in:0(OPQV): failed to set port definition (4) buffers 0/0/0
942088.330: mmal: mmal_ril_port_send: format not set on port 3f01b660, ril.isp:in:0(OPQV)
2151378.153: mmal: mmal_ril_set_port_settings: ril.isp:in:0(OPQV): failed to set port definition (4) buffers 0/0/0
2151378.186: mmal: mmal_ril_port_send: format not set on port 3f01b660, ril.isp:in:0(OPQV)
timg236 commented 3 years ago

Please could you also post the results of 'vcgencmd get_config int' plus 'grep Revision /proc/cpuinfo'

Botspot commented 3 years ago
arm_64bit=1
arm_freq=2000
audio_pwm_mode=514
config_hdmi_boost=5
core_freq=500
core_freq_min=200
disable_commandline_tags=2
disable_l2cache=1
disable_overscan=1
display_hdmi_rotate=-1
display_lcd_rotate=-1
enable_gic=1
force_eeprom_read=1
force_pwm_open=1
framebuffer_ignore_alpha=1
framebuffer_swap=1
gpu_freq=500
gpu_freq_min=250
init_uart_clock=0x2dc6c00
lcd_framerate=60
mask_gpu_interrupt0=1024
mask_gpu_interrupt1=0x10000
max_framebuffers=2
over_voltage=6
over_voltage_avs=-30000
pause_burst_frames=1
program_serial_random=1
total_mem=4096
hdmi_force_cec_address:0=65535
hdmi_force_cec_address:1=65535
hdmi_pixel_freq_limit:0=0x11e1a300
hdmi_pixel_freq_limit:1=0x11e1a300
Revision    : c03111
timg236 commented 3 years ago

Thanks. I suspect the over-clocking configuration may be the problem. Does it still fail if you reduce the ARM frequency to 1800 MHz?

Ruffio commented 3 years ago

@Botspot have you tested if the overcloking was the issue?

Botspot commented 3 years ago

@Botspot have you tested if the overcloking was the issue?

Yes, I've been keeping an eye on it.

It appears a 1.8GHz overclock greatly reduces the chances of the "trigger", but it still happened. It's happened twice since Dec. 3rd. And that's a great improvement.

ZedeN1 commented 3 years ago

@Botspot try this: https://www.raspberrypi.org/forums/viewtopic.php?f=28&t=308168&p=1845168#p1845159

Basically: sudo rpi-update 453e49bdd87325369b462b40e809d5f3187df21d

Solved a lot of playback issues I had since about December especially with 4K content.

dasl- commented 2 years ago

Hi, wondering if there's any update on this issue? I believe I am also encountering this issue. Some things I've noticed: 1) I have 10 raspberry pis (model 4 B). This issue seems to happen much more often on two of them. 2) When this issue randomly starts happening, a restart fixes it, but sometimes the pi gets into a state where the "trigger" for this bug happens much more frequently than normal. So even after a restart, it will quickly fail again. Other times, the issue is much more difficult to "trigger" and things seem very stable. It varies from day to day. 3) If I strace the video playback process, I can prevent this bug from ever triggering. I'm using omxplayer, so if I run: sudo strace -p $(pidof omxplayer.bin) -y -Ttt, I can ensure that the bug never triggers. This is really weird! Sometimes, after stopping the strace process via ctrl+c, the bug will immediately trigger and video decoding will freeze. So running strace seems to protect against this bug triggering.

Here's my logs / configuration. Note that I am not overclocking, but I do have enable_tvout=1, which as I understand actually slows down system clock speed

$ sudo vcdbg log msg
004829.555: arasan: arasan_emmc_open
004980.353: brfs: File read: /mfs/sd/config.txt
004981.204: brfs: File read: 1911 bytes
005031.612: HDMI0:EDID error reading EDID block 0 attempt 0
005037.635: HDMI0:EDID error reading EDID block 0 attempt 1
005043.660: HDMI0:EDID error reading EDID block 0 attempt 2
005049.682: HDMI0:EDID error reading EDID block 0 attempt 3
005055.707: HDMI0:EDID error reading EDID block 0 attempt 4
005061.730: HDMI0:EDID error reading EDID block 0 attempt 5
005067.755: HDMI0:EDID error reading EDID block 0 attempt 6
005073.777: HDMI0:EDID error reading EDID block 0 attempt 7
005079.802: HDMI0:EDID error reading EDID block 0 attempt 8
005085.825: HDMI0:EDID error reading EDID block 0 attempt 9
005086.842: HDMI0:EDID giving up on reading EDID block 0
005107.126: HDMI1:EDID error reading EDID block 0 attempt 0
005113.148: HDMI1:EDID error reading EDID block 0 attempt 1
005119.173: HDMI1:EDID error reading EDID block 0 attempt 2
005125.196: HDMI1:EDID error reading EDID block 0 attempt 3
005131.221: HDMI1:EDID error reading EDID block 0 attempt 4
005137.243: HDMI1:EDID error reading EDID block 0 attempt 5
005143.268: HDMI1:EDID error reading EDID block 0 attempt 6
005149.291: HDMI1:EDID error reading EDID block 0 attempt 7
005155.316: HDMI1:EDID error reading EDID block 0 attempt 8
005161.338: HDMI1:EDID error reading EDID block 0 attempt 9
005162.354: HDMI1:EDID giving up on reading EDID block 0
005164.205: brfs: File read: /mfs/sd/config.txt
006271.329: gpioman: gpioman_get_pin_num: pin DISPLAY_DSI_PORT not defined
006273.832: *** Restart logging
006273.853: brfs: File read: 1911 bytes
006274.546: vec: vec_middleware_power_on: vec_base: 0x7ec13000 rev-id 0x00002708 @ vec: 0x7ec13100 @ 0x00000420 enc: 0x7ec13060 @ 0x00000220 cgmsae: 0x7ec1305c @ 0x00000000
006279.571: dtb_file 'bcm2711-rpi-4-b.dtb'
006285.332: brfs: File read: /mfs/sd/bcm2711-rpi-4-b.dtb
006285.349: Loading 'bcm2711-rpi-4-b.dtb' to 0x100 size 0xc2a9
006298.046: brfs: File read: 49833 bytes
006311.437: brfs: File read: /mfs/sd/overlays/overlay_map.dtb
006375.105: brfs: File read: 1667 bytes
006377.608: brfs: File read: /mfs/sd/config.txt
006378.087: dtparam: audio=on
006388.148: brfs: File read: 1911 bytes
006402.590: brfs: File read: /mfs/sd/overlays/vc4-fkms-v3d-pi4.dtbo
006419.790: Loaded overlay 'vc4-fkms-v3d'
006454.354: brfs: File read: 1446 bytes
006455.441: brfs: File read: /mfs/sd/overlays/vc4-fkms-v3d-pi4.dtbo
006472.747: Loaded overlay 'vc4-fkms-v3d'
006507.321: brfs: File read: 1446 bytes
006510.971: brfs: File read: /mfs/sd/overlays/disable-wifi.dtbo
006519.506: Loaded overlay 'disable-wifi'
006543.746: brfs: File read: 387 bytes
006545.682: brfs: File read: /mfs/sd/cmdline.txt
006545.719: Read command line from file 'cmdline.txt':
006545.730: 'console=serial0,115200 console=tty1 root=PARTUUID=f8aedb56-02 rootfstype=ext4 elevator=deadline fsck.repair=yes rootwait'
006844.008: brfs: File read: 121 bytes
007414.928: brfs: File read: /mfs/sd/kernel7l.img
007414.947: Loading 'kernel7l.img' to 0x8000 size 0x6773e0
007414.975: Device tree loaded to 0x2eff3600 (size 0xc932)
007420.598: gpioman: gpioman_get_pin_num: pin SDCARD_CONTROL_POWER not defined
009997.162: vchiq_core: vchiq_init_state: slot_zero = 0xdad80000, is_master = 1
013325.037: vec: vec_middleware_power_on: vec_base: 0x7ec13000 rev-id 0x00002708 @ vec: 0x7ec13100 @ 0x00000420 enc: 0x7ec13060 @ 0x00000220 cgmsae: 0x7ec1305c @ 0x00000000
024841.425: video_decode:1:RIL: nothing changing
506829.077: video_decode:8:RIL: colourspace changing
$ vcgencmd version
Nov 18 2021 16:16:49
Copyright (c) 2012 Broadcom
version d9b293558b4cef6aabedcc53c178e7604de90788 (clean) (release) (start)
$ vcgencmd get_config int
arm_freq=1500
audio_pwm_mode=514
config_hdmi_boost=5
core_freq=500
core_freq_min=200
disable_commandline_tags=2
disable_l2cache=1
display_hdmi_rotate=-1
display_lcd_rotate=-1
dvfs=3
enable_gic=1
enable_tvout=1
force_eeprom_read=1
force_pwm_open=1
framebuffer_ignore_alpha=1
framebuffer_swap=1
gpu_freq=500
gpu_freq_min=250
init_uart_clock=0x2dc6c00
lcd_framerate=60
mask_gpu_interrupt0=1024
mask_gpu_interrupt1=0x10000
max_framebuffers=2
over_voltage_avs=-17500
pause_burst_frames=1
program_serial_random=1
total_mem=2048
hdmi_force_cec_address:0=65535
hdmi_force_cec_address:1=65535
hdmi_pixel_freq_limit:0=0x11e1a300
hdmi_pixel_freq_limit:1=0x11e1a300
% grep Revision /proc/cpuinfo
Revision    : b03112
popcornmix commented 2 years ago

omxplayer is no longer supported. Updating to bullseye and using vlc would be recommended.

The strace avoiding the issue is interesting, as it probably causes the arm to remain busy and so not lower the clock. Try running with force_turbo=1 in config.txt and see if that similarly avoids the issue.

dasl- commented 2 years ago

Thanks for the suggestion @popcornmix .

I am aware that omxplayer is no longer supported -- I am only using it because I require hardware accelerated cropping features. Last I checked, VLC did software cropping. Perhaps I should file an issue on VLC to implement software cropping.

I tried running with force_turbo=1. This did not help unfortunately. For good measure, I also tried running with the below settings, which also did not help:

force_turbo=1
gpu_freq=500
gpu_freq_min=500
core_freq=500
core_freq_min=500
h264_freq=500
h264_freq_min=500
isp_freq=500
isp_freq_min=500
v3d_freq=500
v3d_freq_min=500
hevc_freq=500
hevc_freq_min=500
arm_freq=1500
arm_freq_min=1500
sdram_freq=3200
sdram_freq_min=3200

https://www.raspberrypi.com/documentation/computers/config_txt.html#overclocking

dasl- commented 2 years ago

@popcornmix your suggestion inspired another thing for me to try. I tried maxing out a CPU via:

$ /usr/bin/env bash -c 'while true ; do : ; done'

This pegs a CPU at 100% busy. When I do this at the same time as playing a video, I seem to be able to avoid "triggering" the bug. Given this finding, it's strange to me that setting force_turbo=1 does not also avoid triggering the bug. I can see via vcgencmd measure_clock arm that both the bash busy looping and setting force_turbo=1 have the same effect of ensuring the arm frequency stays high.

Ideally there would be a better solution than pegging a CPU (which increases power consumption and operating temperatures) for solving this problem though 😅 . All ears if anyone has any more ideas!

popcornmix commented 2 years ago

Can you remove any overclocking/force_turbo and try over_voltage=2?

dasl- commented 2 years ago

Can you remove any overclocking/force_turbo and try over_voltage=2?

Holy shit, @popcornmix , I think this has fixed it! So far I'm unable to reproduce the bug with over_voltage=2!! I've spent so many hours over the past few months and replaced several raspberry pis trying to figure out what was happening, so I'm very glad to have what looks like a solution. I should let it run for a few days to confirm that the bug does not recur though. At the very least, I think this will surely reduce the frequency of the bug's occurrence.

I'm very curious if you could explain what is going on and why this fixes the issue?

Thank you so much.

popcornmix commented 2 years ago

We have seen occasional cases where the chip specific voltage chosen by avs is borderline for the h.264 decoder. It tends to largely work but may fail after longer time periods. A small addition to the voltage tends to avoid the issue. You may well find over_voltage=1 is sufficient.

Botspot commented 2 years ago

We have seen occasional cases where the chip specific voltage chosen by avs is borderline for the h.264 decoder. It tends to largely work but may fail after longer time periods. A small addition to the voltage tends to avoid the issue. You may well find over_voltage=1 is sufficient.

That probably explains why I've never encountered this issue on my 8GB Pi. Any objections to closing this issue?

popcornmix commented 2 years ago

Any objections to closing this issue?

Fine by me

dasl- commented 2 years ago

I wonder if raspberry pis should default to using a higher voltage? Not sure what the downsides of that are. It is a bit frustrating that the default settings on some pis are subject to this buggy behavior.

popcornmix commented 2 years ago

It's something we are aware of. If we can find a test that identifies the boards that need it, we may be able to just apply it to those boards (but as it can take hours or days to occur, it's not trivial, but it may be possible to accelerate by testing at increased clock or lower voltage). Applying higher voltage to all boards is undesirable as it will result in a small increase in temperature. As we've only seen a handful of reports of this issue, we need to weigh up the pros and cons.

dasl- commented 2 years ago

I see, thanks for explaining. One more thing I was curious about: why does pegging a CPU at 100% busy also seem to prevent the bug from occurring? Does this affect the voltage somehow?

Botspot commented 2 years ago

It's something we are aware of. If we can find a test that identifies the boards that need it, we may be able to just apply it to those boards (but as it can take hours or days to occur, it's not trivial, but it may be possible to accelerate by testing at increased clock or lower voltage). Applying higher voltage to all boards is undesirable as it will result in a small increase in temperature. As we've only seen a handful of reports of this issue, we need to weigh up the pros and cons.

While your reasons seem logical @popcornmix, I am fairly confident that the number of people being affected is greater than it appears. For an issue like this, I think very few people would be able to correctly describe it. The rest would just say things like "My pi4 seems unreliable...", or "YouTube/VLC sometimes doesn't work" On the Pi forums, I've observed a general opinion by some members that the Pi4 is generally less reliable than the Pi3. Perhaps this is playing a part.

dasl- commented 2 years ago

Regarding the prevalence of this issue, I have 11 raspberry pi 4 machines that I use to play videos 24/7. I have observed this issue to occur on 4 of the 11 machines fairly regularly (it happens daily to weekly). Although I believe I have a particular use case that might make the "trigger" more likely to happen: I issue commands to omxplayer via dbus constantly, and I stream video over the network. Still, 4 out of 11 is a large proportion.