frank-w / BPI-Router-Linux

Linux kernel 4.14+ for BPI-R2, 5.4+ for R64, 6.1+ for R2Pro and R3
Other
136 stars 47 forks source link

[Solved] About 10x performance degradation with 5.15-main #89

Closed black-ish closed 2 years ago

black-ish commented 2 years ago

Hello! First I want to thank Frank-W for your wonderful work you do with these systems! Thank you very much!

Second, it seems I found something interesting but I'm not knowledgable enough to "debug" more of it: So I upgraded from 5.10-main.52 to 5.15-main.26 (compiled those myself with the imported config and then added some stuff I needed) and noticed that my internet was somewhat slow suddenly.

I did some performance and speedtesting and it seemed to cap at around 3MB/s at the max. Next I did some iperf3 testing and those cap out with this: "[ 4] 13.00-14.00 sec 18.2 MBytes 153 Mbits/sec".

Since I changed my NIC between the kernel upgrade I checked with a different NIC but those also cap out the same as above.

Next I compiled the 5.10-main.110 kernel at it was back to normal: "[ 4] 7.00-8.00 sec 113 MBytes 949 Mbits/sec"

I also checked back with the older 5.10-main.52 and it was the same: "[ 4] 2.00-3.00 sec 113 MBytes 948 Mbits/sec"

Edit: In addition it seems also any kind of connection like ssh seems rather sluggish like there may be a scheduler problem or something?

So yes for some reason it seems that the 5.15-main kernel is the cause. I have no idea why though and no idea how to check it so I thought I report my finding here.

Edit2: I use a BPi-R2

frank-w commented 2 years ago

You should check versions between (5.11-main till 5.14-main) to get the last version working and first failing. Then compare switch driver (drivers/net/dsa/mt7530.c) and gmac driver (drivers/net/ethernet/mtk_soc_eth.c) in git.

There is also a tempersture issue which causes cpu-slowdown. Try reading your temperature and cpu_freq values

You can disable kernel config option (imho disabled in 5.10) or drop lower 2 temperture trips

https://github.com/openwrt/openwrt/issues/9396#issuecomment-1101884199

black-ish commented 2 years ago

Alright, when I have the time for that commitment I will compile the kernels and check, so please keep this issue open.

I never had a temperature problem with it even when stress testing it (pegging every core at 100% for an extended period of time) it never went above 74°C. I do wanted to modify the case to insert a fan anyway so I will get some cooling blocks and put that + a fan in it.

frank-w commented 2 years ago

ok, when it reaches 74°C the cpu-slowdown is not the problem

https://forum.banana-pi.org/t/bpi-r64-only-10-cpu-speed-at-already-48-degrees-celcius-speed-not-increasing-anymore/12262/37?u=frank-w

it happens in lower versions at 47 °C and after my mainline-patch at 57°C (have increased the lower 2 trips a bit, but sometimes this is not enough, e.g. when using gpu)

so keep going on with the kernel-versions between 5.10 and 5.15 and then compare network-drivers

black-ish commented 2 years ago

Ohh I just noticed I never mentioned what I'm using did I? :sweat_smile: I use a BPi-R2.

Edit: Edited the first post to mention the device.

black-ish commented 2 years ago

Ok I think it was the thermal issue you linked, I had to research quite a bit to check for the CPU freq without tools. It of course got dropped down to 98MHz after 57°C. For some reason I never noticed that until upgrading to the newest kernel.

I'll close this issue for the time being as I think I solved it.

frank-w commented 2 years ago

how exactly have you solved it? just removed the first 2 trips (passive,active)?

black-ish commented 2 years ago

Yes I removed them.

frank-w commented 2 years ago

added patches for it in my 5.18-rc tree (one for mt7622 and one for mt7623)

can you verify it's the way that works for you?

black-ish commented 2 years ago

I suggest removing the passive block completely. Simply because as soon as that trips the clocks never go back to normal when the temperature is below that. At least in my tests they didn't.

frank-w commented 2 years ago

You have reached such temp? I've seen only 58 on my r2,but not using gpu.

If i remove them first trip is hot at 87 degrees

black-ish commented 2 years ago

Not under idle conditions. But they can be reached when under heavy load, especially with the metal/carbon case, which has no airflow at all. And 87°C is not much of a problem for most chips, so if it reaches that there is a good amount of leeway up, which is why the critical point is set at 107°C.