b-rad-NDi / Ubuntu-media-tree-kernel-builder

Slip stream the latest LinuxTV.org media drivers into an installable Ubuntu kernel package
98 stars 9 forks source link

QuadHD PCIe often looses connection #135

Closed KaeTuuN closed 2 years ago

KaeTuuN commented 2 years ago

Hi,

OS: Ubuntu 20.04 LTS HWE (Kernel 5.13) CPU: AMD Ryzen 5 3600 Board: ASRock Rack X470D4U

since a week ago I'm facing sporadic stuttering and get the following Messages:

[94983.290484] cx23885: cx23885[1]: mpeg risc op code error
[94983.290759] cx23885: cx23885[1]: TS1 B - dma channel status dump
[94983.290762] cx23885: cx23885[1]:   cmds: init risc lo   : 0xffc1c000
[94983.290767] cx23885: cx23885[1]:   cmds: init risc hi   : 0x00000000
[94983.290770] cx23885: cx23885[1]:   cmds: cdt base       : 0x00010870
[94983.290773] cx23885: cx23885[1]:   cmds: cdt size       : 0x0000000a
[94983.290776] cx23885: cx23885[1]:   cmds: iq base        : 0x00010630
[94983.290779] cx23885: cx23885[1]:   cmds: iq size        : 0x00000010
[94983.290782] cx23885: cx23885[1]:   cmds: risc pc lo     : 0xffc1c018
[94983.290786] cx23885: cx23885[1]:   cmds: risc pc hi     : 0x00000000
[94983.290789] cx23885: cx23885[1]:   cmds: iq wr ptr      : 0x00004192
[94983.290792] cx23885: cx23885[1]:   cmds: iq rd ptr      : 0x0000418c
[94983.290795] cx23885: cx23885[1]:   cmds: cdt current    : 0x000108b8
[94983.290798] cx23885: cx23885[1]:   cmds: pci target lo  : 0xfe249a70
[94983.290801] cx23885: cx23885[1]:   cmds: pci target hi  : 0x00000000
[94983.290804] cx23885: cx23885[1]:   cmds: line / byte    : 0x03890000
[94983.290807] cx23885: cx23885[1]:   risc0: 
[94983.290815] cx23885: cx23885[1]:   risc1: 
[94983.290937] cx23885: cx23885[1]:   risc2: 
[94983.290941] cx23885: cx23885[1]:   risc3: 
[94983.290946] cx23885: cx23885[1]:   (0x00010630) iq 0: 
[94983.290951] cx23885: cx23885[1]:   iq 1: 0xfe249d60 [ arg #1 ]
[94983.290955] cx23885: cx23885[1]:   iq 2: 0x00000000 [ arg #2 ]
[94983.290958] cx23885: cx23885[1]:   (0x0001063c) iq 3: 
[94983.290963] cx23885: cx23885[1]:   iq 4: 0xfe24a050 [ arg #1 ]
[94983.290966] cx23885: cx23885[1]:   iq 5: 0x00000000 [ arg #2 ]
[94983.290969] cx23885: cx23885[1]:   (0x00010648) iq 6: 
[94983.290976] cx23885: cx23885[1]:   (0x0001064c) iq 7: 
[94983.290980] cx23885: cx23885[1]:   (0x00010650) iq 8: 
[94983.290984] cx23885: cx23885[1]:   (0x00010654) iq 9: 
[94983.290989] cx23885: cx23885[1]:   iq a: 0xfe248eb0 [ arg #1 ]
[94983.290992] cx23885: cx23885[1]:   iq b: 0x00000000 [ arg #2 ]
[94983.290995] cx23885: cx23885[1]:   (0x00010660) iq c: 
[94983.291000] cx23885: cx23885[1]:   iq d: 0xfe2491a0 [ arg #1 ]
[94983.291003] cx23885: cx23885[1]:   iq e: 0x00000000 [ arg #2 ]
[94983.291006] cx23885: cx23885[1]:   (0x0001066c) iq f: 
[94983.291011] cx23885: cx23885[1]:   iq 10: 0x1c0002f0 [ arg #1 ]
[94983.291014] cx23885: cx23885[1]:   iq 11: 0xff218eb0 [ arg #2 ]
[94983.291015] cx23885: cx23885[1]: fifo: 0x00005000 -> 0x6000
[94983.291017] cx23885: cx23885[1]: ctrl: 0x00010630 -> 0x10690
[94983.291020] cx23885: cx23885[1]:   ptr1_reg: 0x00005250
[94983.291023] cx23885: cx23885[1]:   ptr2_reg: 0x00010878
[94983.291026] cx23885: cx23885[1]:   cnt1_reg: 0x00000025
[94983.291028] cx23885: cx23885[1]:   cnt2_reg: 0x00000009

Also TV Headend says it is getting a Timeout:

Jan 19 21:33:02 <servername> tvheadend[5508]: message repeated 13 times: [ linuxdvb: Silicon Labs Si2168 #3 : DVB-C #0 - poll TIMEOUT]
Jan 19 21:33:09 <servername> tvheadend[5508]: linuxdvb: Silicon Labs Si2168 #3 : DVB-C #0 - poll TIMEOUT
Jan 19 21:34:49 <servername> tvheadend[5508]: message repeated 13 times: [ linuxdvb: Silicon Labs Si2168 #3 : DVB-C #0 - poll TIMEOUT]
Jan 19 21:34:57 <servername> tvheadend[5508]: linuxdvb: Silicon Labs Si2168 #3 : DVB-C #0 - poll TIMEOUT
Jan 19 21:35:04 <servername> tvheadend[5508]: linuxdvb: Silicon Labs Si2168 #3 : DVB-C #0 - poll TIMEOUT
Jan 19 21:35:12 <servername> tvheadend[5508]: linuxdvb: Silicon Labs Si2168 #3 : DVB-C #0 - poll TIMEOUT
Jan 19 21:35:20 <servername> tvheadend[5508]: linuxdvb: Silicon Labs Si2168 #3 : DVB-C #0 - poll TIMEOUT
Jan 19 21:36:29 <servername> tvheadend[5508]: message repeated 9 times: [ linuxdvb: Silicon Labs Si2168 #3 : DVB-C #0 - poll TIMEOUT]
Jan 19 21:36:36 <servername> tvheadend[5508]: linuxdvb: Silicon Labs Si2168 #3 : DVB-C #0 - poll TIMEOUT
Jan 19 21:38:16 <servername> tvheadend[5508]: message repeated 13 times: [ linuxdvb: Silicon Labs Si2168 #3 : DVB-C #0 - poll TIMEOUT]
Jan 19 21:38:23 <servername> tvheadend[5508]: linuxdvb: Silicon Labs Si2168 #3 : DVB-C #0 - poll TIMEOUT
Jan 19 21:38:54 <servername> tvheadend[5508]: message repeated 4 times: [ linuxdvb: Silicon Labs Si2168 #3 : DVB-C #0 - poll TIMEOUT]
Jan 19 21:39:02 <servername> tvheadend[5508]: linuxdvb: Silicon Labs Si2168 #3 : DVB-C #0 - poll TIMEOUT
Jan 19 21:39:55 <servername> tvheadend[5508]: message repeated 7 times: [ linuxdvb: Silicon Labs Si2168 #3 : DVB-C #0 - poll TIMEOUT]
Jan 19 21:40:03 <servername> tvheadend[5508]: linuxdvb: Silicon Labs Si2168 #3 : DVB-C #0 - poll TIMEOUT

My question is know: what can I do? If you need any further Information, please let me know!

b-rad-NDi commented 2 years ago

Hello. This is the same issue that is intermittently reported. You don't say what sort of hardware you're using, but this is often motherboard related. On AMD and Xeon platforms, and recently some other intel platforms you have to set a kernel parameter that does a DMA fix. On lots of other platforms you have to go into bios and force the slot with the quadHD to gen1 mode. You might read through some of the other issues that describe peoples experience with this.

The pcie driver in question was written long ago. I've inquired with conexant in the past about tips to solve this, but they've said the issue is probably in the linux pcie stack, arch related, and not in their driver.

KaeTuuN commented 2 years ago

Hi @b-rad-NDi , thanks for the reply. I did some research in the meantime and found out the following:

So from my perspective it looks like there is a something different with the Hauppauge update this time. Nevertheless I will try downgrading the PCIe Version of this Slot to Gen 1 and set the Kernelparameter. Can you point me to the Parameter you meant? Is it the DMA workarounf mentioned in #51 or more specifically in this comment: https://github.com/b-rad-NDi/Ubuntu-media-tree-kernel-builder/issues/51#issuecomment-498923660 ?

Greetings KaeTuuN

PS: I added CPU and Board to my initial Post.

shspvr commented 2 years ago

The post was try setting "options cx23885 dma_reset_workaround=2"

b-rad-NDi commented 2 years ago

Ryzen 5, yup, you must need the dma workaround parameter. I'll need to get your board iommu id as well to submit it upstream as part of the list of affected boards.

KaeTuuN commented 2 years ago

@shspvr thx! :-)

@b-rad-NDi It is 1481

lspci -v -nn -k | grep IOMMU
00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse IOMMU [1022:1481]
    Subsystem: Advanced Micro Devices, Inc. [AMD] Starship/Matisse IOMMU [1022:1481]

Thanks for the help, it seems to work know! :+1:

b-rad-NDi commented 2 years ago

Glad to hear

KaeTuuN commented 2 years ago

Well, I was a little quick closing this Issue. The error is still there... At the moment I try to figure out if it works better if the Server does have some load and is not using power saving features. I will give you some Feedback tomorrow.

KaeTuuN commented 2 years ago

@b-rad-NDi I stressed my CPU with stress-ng --cpu 4 --io 4 --vm 4 --vm-bytes 1024M but it had no effect on my Problem. Also I tried to produce IO Traffic with shred -v -n1 /dev/sdX but again, nothing changed. Do you have any further Advice? Downgrading my PCIe Link to Gen 1 seems not possible in my UEFI.

shspvr commented 2 years ago

Start by try disable C6 Mode then try also try with CPB mode or the other way around and don't need SVM I disable this any way as it for virtual machine. The thing to get hold of Asrock KaeTuuN

KaeTuuN commented 2 years ago

Thanks for the reply, but I found a solution. I switched the card from Slot PCIe 4 to PCIe 6 and after that no more Errors appeared. I made the change not hoping to get rid of the Error but to a needed HW Change. The solution was a positive side effect.

Btw.: Disabling SVM is not an Option for me. I have 6 VMs running on that Machine. ;) The new card (SAS Extension) works great in Slot PCIe 4.