Frequent system crashes with BC-H08240A

greyghost509 commented 8 years ago

I have BC-H08240A running unter Linux kernel 3.18.7. Getting video frames from a single channel using ffmpeg or mplayer works fine. However, when increasing the number of channels being read, dmesg begins reporting stuff like this:

[ 1961.100513] WARNING: CPU: 5 PID: 3894 at drivers/media/v4l2-core/videobuf2-core.c:2135 vb2_queue_cancel+0x116/0x180 [videobuf2_core]() [ 1961.100514] Modules linked in: solo6x10 x86_pkg_temp_thermal videobuf2_dma_contig videobuf2_dma_sg videobuf2_memops videobuf2_core [ 1961.100518] CPU: 5 PID: 3894 Comm: ffmpeg Tainted: G W 3.18.7-gentoo #2 [ 1961.100519] Hardware name: Supermicro X10SRi-F/X10SRi-F, BIOS 1.0a 08/27/2014 [ 1961.100520] 0000000000000009 ffff880459943bc8 ffffffff818fca15 0000000000000000 [ 1961.100522] 0000000000000000 ffff880459943c08 ffffffff81048dec ffff88046ce888d0 [ 1961.100523] 0000000000000000 0000000000000001 0000000000000000 ffff880468c60550 [ 1961.100524] Call Trace: [ 1961.100531] [] dump_stack+0x46/0x58 [ 1961.100534] [] warn_slowpath_common+0x7c/0xa0 [ 1961.100536] [] warn_slowpath_null+0x15/0x20 [ 1961.100538] [] vb2_queue_cancel+0x116/0x180 [videobuf2_core] [ 1961.100539] [] vb2_internal_streamoff+0x35/0xd0 [videobuf2_core] [ 1961.100541] [] vb2_streamoff+0x25/0x50 [videobuf2_core] [ 1961.100542] [] vb2_ioctl_streamoff+0x40/0x50 [videobuf2_core] [ 1961.100546] [] v4l_streamoff+0x15/0x20 [ 1961.100547] [] video_do_ioctl+0x274/0x2f0 [ 1961.100549] [] video_usercopy+0x20e/0x580 [ 1961.100551] [] ? v4l_dqevent+0x20/0x20 [ 1961.100554] [] ? __free_pages+0x27/0x30 [ 1961.100556] [] ? slab_free+0xfc/0x2ae [ 1961.100559] [] ? tlb_finish_mmu+0x2d/0x40 [ 1961.100562] [] ? unmap_region+0xce/0x110 [ 1961.100563] [] video_ioctl2+0x10/0x20 [ 1961.100564] [] v4l2_ioctl+0x11b/0x150 [ 1961.100567] [] do_vfs_ioctl+0x2c8/0x4a0 [ 1961.100569] [] ? do_munmap+0x29f/0x3b0 [ 1961.100570] [] SyS_ioctl+0x3c/0x80 [ 1961.100574] [] system_call_fastpath+0x12/0x17 [ 1961.100575] ---[ end trace 50eb834316a63f8b ]--- [ 2269.850563] kworker/dying (1846) used greatest stack depth: 12040 bytes left

Sooner or later, the system will crash. When grabbing more than 4 channels at the same time, the system will reliably crash within seconds (regardless of using mplayer for ffmpeg). Interestingly, this only appears to happen when there are signals connected to the inputs. I have 8 channels being grabbed in parallel just fine for many hours without any signal connected.

curtishall commented 8 years ago

What FPS are you capturing at (total) and what resolution?

greyghost509 commented 8 years ago

This is an example of the ffmpeg line I use: ffmpeg -loglevel error -an -f v4l2 -input_format h264 -s 352x240 -i /dev/video2

I think I read in the ffmpeg documentation that the frame rate should not be set/limited for a live stream coming from a capture device (I pass the stream on to ffserver). When using mplayer, I also did not set any rate limit. Is that necessary? Which rate limit is recommended/required?

curtishall commented 8 years ago

The only limitation is you can't exceed 150FPS (total) at 704x480. You can do real-time recording at 352x240, so it doesn't appear you are exceeding the limit.

greyghost509 commented 8 years ago

So are there any other ideas what might be wrong here? The crashes look really bad, segfaults in arbitrary running programs. Looks like random memory is overwritten all over the place.

andrey-utkin commented 8 years ago

Hello, sorry for inconvenience. @gerritkuehn Could you please try to run a self-built kernel from "master" branch of our repo https://github.com/bluecherrydvr/linux ? We can also build deb packages of it for you, but these packages are not yet built.

greyghost509 commented 8 years ago

I'll check out the repo and try. Don't worry about deb packages, they won't help here, anyway:

pt@pt-video ~ $ uname -r 3.18.7-gentoo

greyghost509 commented 8 years ago

Your kernel appears to work much better. I won't declare this a full victory yet (needs some more time and tests, running with 8 channels only for a couple of minutes now - I also have a second card to plug into the system to come up with another set of 8 channels), but it definitely does not crash as fast as before.

The major difference I can see is the [solo6x10_ring] process: With the old kernel, one of these processes was created for every ffmpeg instance I was running. With the new kernel I see only one instance of [solo6x10_ring], no matter how many times I start ffmpeg.

Where is the difference? I'd rather like to go back to a kernel officially distributed by Gentoo. Would updating to 4.8 suffice, maybe with a recent checkout of your solo driver repository, or are there changes for the kernel itself involved that have not yet made it back into mainline?

andrey-utkin commented 8 years ago

Yes, there is one may be important if you have solo6010 card: https://github.com/bluecherrydvr/linux/commit/79ee588e62e76e2e312073d2987c3ee2a1074d49

greyghost509 commented 8 years ago

This fixes a regression that was introduced in June 2015 as far as I can see. However, my old kernel was 3.18.7, sources downloaded in March 2015, so it cannot contain the offending code under suspicion?! In any case, the fix you mention is rather new and has not made it back into neither -stable nor mainline kernel (at least I cannot find it there).

The good news from my side is that your development kernel worked perfectly over the whole weekend here with ffmpeg running on all 8 channels. So I'll try upgrading to the latest kernel offered by the distribution and use it with the recent driver from your (this) repo.

greyghost509 commented 8 years ago

I'm using kernel 4.8.2 from Gentoo now, and it appears to run stable (even without using the most recent solo6x10 driver from the repo).

bluecherrydvr / solo6x10

Frequent system crashes with BC-H08240A #79