MarvellEmbeddedProcessors / linux-marvell

Marvell Linux kernel
Other
89 stars 67 forks source link

cpufreq: armada-37xx: forbid cpufreq for 1.2 GHz variant #20

Open erdoukki opened 3 years ago

erdoukki commented 3 years ago

https://github.com/openwrt/openwrt/commit/f407b2f43c27a7b35c4f96d3efcee2cc440f8efe

How can we contact Marvell to have the needed information ?

pali commented 3 years ago

@kostapr: Hi! Could you look at this bug report?

kostapr commented 3 years ago

I have no problem with this patch. BTW, Ken Ma, Igal Liberman and Victor Gu are not with Marvell anymore. For the future Armada-related patches, please add Stefan (stefanc@marvell.com) and Nadav (nadavh@marvell.com) to the CC list. Reviewed-by: Konstantin Porotchkin kostap@marvell.com

robimarko commented 3 years ago

@kostapr But that patch is not a solution, it's just a hotfix to get the devices booting and not constantly crashing due to voltage issues. The solution cant be just to disable cpufreq and force the lowest frequency

kostapr commented 3 years ago

@robimarko I cannot comment on the problem. I personally think that not all 37xx dies are capable to work stable at 1.2GHz. However, this should be confirmed by HW design or production team at Marvell. Hopefully @haklai can add more on this matter.

robimarko commented 3 years ago

@kostapr Well that is normal, but the thing is that those marked and sold as 1.2GHz ones are having issues, personally, I have a lot of those in the field and they all crash currently if you allow them to scale. I know that @pali has been trying to get solved for a while now.

pali commented 3 years ago

There is part order number 88F3720-xx-–BVB2C120-P123 of A37xx SoC which is designed for 1.2 GHz. This SoC die has below its Marvell logo marking C120 (speed code).

So @robimarko could you confirm that you have the right 37xx die which is designed for 1.2 GHz and in this case @kostapr or @haklai could you get more information about HW design / production team where is the issue?

robimarko commented 3 years ago

@pali I opened one of the Esspresobin Ultras I have and the SoC PN is: 88F3720-A0 C120 Even the stock ATF/WTMI and U-boot see it as a 1.2GHz model. I am attaching the image as well. https://imgur.com/A02jhaw

pali commented 3 years ago

@kostapr so for sure above @robimarko's SoC is designed for 1.2 GHz.

stefanchulski commented 3 years ago

@pali If you disable DFS feature and boot with 1.2GHz frequency only, do you see any crashes?

pali commented 3 years ago

@stefanchulski currently I do not have 1.2GHz variant of A3720 SoC.

@robimarko and @erdoukki could you please do required tests for @stefanchulski?

erdoukki commented 3 years ago

Sure, with pleasure, as usual... Just give me the needed patch file or binary, please. I will also check the CPU rerefence of my Ultra.

robimarko commented 3 years ago

@stefanchulski If I am seeing it correctly, it's using 1200MHz by default after booting as the kernel is not scaling it anymore.

[    2.305272] Unsupported CPU frequency 1200 MHz
root@OpenWrt:/sys/devices/system/cpu# cat /sys/kernel/debug/clk/cpu/clk_rate 
1200000000

I need to really stress test it before claiming that it's stable with the WTMI set VDD. Which in my case is: SVC REV: 5, CPU VDD voltage: 1.213V

But I have seen samples that use 1.26V as well, and I don't think that the CPUFreq has a way to know this and uses too low voltage for most boards.

UPDATE: Even a couple of seconds of stress testing will crash it, so it's not stable at all:

root@OpenWrt:/# stress --cpu 2 --io 2 --timeout 1h
stress: info: [2444] dispatching hogs: 2 cpu, 2 io, 0 vm, 0 hdd
[   55.174519] ------------[ cut here ]------------
[   55.179312] Kernel BUG at do_undefinstr+0x27c/0x290 [verbose debug info unavailable]
[   55.187300] Internal error: Oops - BUG: 0 [#1] SMP
[   55.192237] Modules linked in: pppoe ppp_async iptable_nat xt_state xt_nat xt_conntrack xt_REDIRECT xt_MASQUERADE xt_FLOWOFFLOAD pppox ppp_generic nf_nat nf_flow_table nf_conntrack ipt_REJECT xt_time xt_tcpudp xt_multiport xt_mark xt_mac xt_limit xt_comment xt_TCPMSS xt_LOG slhc rtc_pcf8563 nf_reject_ipv4 nf_log_g
[   55.250897] CPU: 1 PID: 771 Comm: loop0 Not tainted 5.10.64 #0
[   55.256909] Hardware name: Globalscale Marvell ESPRESSOBin Ultra Board (DT)
[   55.264088] pstate: 00400005 (nzcv daif +PAN -UAO -TCO BTYPE=--)
[   55.270283] pc : do_undefinstr+0x27c/0x290
[   55.274503] lr : do_undefinstr+0x58/0x290
[   55.278633] sp : ffffffc0110839d0
[   55.282045] x29: ffffffc0110839d0 x28: ffffff8001777300 
[   55.287522] x27: 0000000000000000 x26: 00000000000000e0 
[   55.292999] x25: 0000000000000000 x24: ffffffc010b3f000 
[   55.298476] x23: 0000000060400005 x22: ffffffc0107b68e0 
[   55.303952] x21: 00000000009b4000 x20: 0000000000000000 
[   55.309429] x19: ffffffc011083a40 x18: 0000000000000000 
[   55.314905] x17: 0000000000000000 x16: 0000000000000000 
[   55.320382] x15: 0000000000000000 x14: 0000000000000000 
[   55.325858] x13: 00000000000001f9 x12: 0000000000000040 
[   55.331335] x11: ffffff8001000000 x10: 0000000000000005 
[   55.336812] x9 : 0000000000000001 x8 : 00000000830b65da 
[   55.342288] x7 : 0000000000000005 x6 : ffffffc011083a10 
[   55.347765] x5 : 0000000000000000 x4 : ffffffc010a3fd20 
[   55.353242] x3 : 00000000d5300000 x2 : 0000000000000000 
[   55.358719] x1 : ffffffc010b0b0e8 x0 : 0000000060400005 
[   55.364196] Call trace:
[   55.366716]  do_undefinstr+0x27c/0x290
[   55.370581]  el1_undef+0x2c/0x4c
[   55.373904]  el1_sync_handler+0x8c/0xd0
[   55.377855]  el1_sync+0x88/0x140
[   55.381183]  __hyp_text_end+0x5c/0x77c
[   55.385044]  __wait_for_common+0xe4/0x1e4
[   55.389175]  wait_for_completion_io+0x20/0x30
[   55.393667]  submit_bio_wait+0x4c/0x64
[   55.397532]  blkdev_issue_flush+0x74/0x94
[   55.401664]  blkdev_fsync+0x2c/0x4c
[   55.405258]  vfs_fsync+0x3c/0x7c
[   55.408587]  loop_queue_work+0x368/0x97c
[   55.412632]  kthread_worker_fn+0x100/0x1d0
[   55.416852]  loop_kthread_worker_fn+0x20/0x30
[   55.421341]  kthread+0x124/0x12c
[   55.424665]  ret_from_fork+0x10/0x3c
[   55.428352] Code: d5033fdf d51b4220 17ffffcf a9025bf5 (d4210000) 
[   55.434635] ---[ end trace 8fab771838008e64 ]---
[   55.439393] Kernel panic - not syncing: Oops - BUG: Fatal exception
[   55.445855] SMP: stopping secondary CPUs
[   55.449900] Kernel Offset: disabled
[   55.453494] CPU features: 0x0000002,00002008
[   55.457891] Memory Limit: none
[   55.461037] Rebooting in 3 seconds..
pali commented 3 years ago

My guess is that in wtmi firmware is missing some init sequence related to CPU voltage configuration. See function init_avs(): https://github.com/MarvellEmbeddedProcessors/A3700-utils-marvell/blob/master/wtmi/sys_init/avs.c

There is array otp_data[] filled by OTP values from SoC itself, but some bits are not used. And they are non-zero, so has some value, but there is no documentation what they mean... E.g. low 8 bits in otp_data[OTP_DATA_SVC_REV_ID]. Relevant header file: https://github.com/MarvellEmbeddedProcessors/A3700-utils-marvell/blob/master/wtmi/sys_init/avs.h

pali commented 3 years ago

But I have seen samples that use 1.26V as well, and I don't think that the CPUFreq has a way to know this and uses too low voltage for most boards.

CPUFreq driver armada-37xx-cpufreq.c know this, it grabs this value from OTP (but indirectly, it reads it from register which is filled by wtmi code, which fills it from OTP). Driver uses following Marvell algorithm:

For max_freq 1200 MHz are: div1=2, div2=4, div3=6; for 1000 MHz are: div1=2, div2=4, div3=5; and for 800 MHz are: div1=2, div2=3, div3=4.

But what is source of above Marvell algorithm and these constants (specially those substracted 100mV and 150mV for div1/2/3) I do not know. I was not able to find this documented neither in Armada 3720 Functional or Hardware specification.

And I suspect that these 100mV and 150mV constants are incorrect too as for CPU with max_freq=1GHz I had to do small adjustment in cpufreq driver.

I was told that Marvell reproduced this issue on their 3720 development board last year and was preparing some fix for it, including documentation/errata update. But I have not seen anything.

So it means that somebody in Marvell must have been aware of this issue and should have know more details about it (or somebody who is not with Marvell anymore as @kostapr wrote).

Also look at Armada 3720 Errata document, there is for a long time documented issue related to 1.2GHz mode.

pali commented 3 years ago

@stefanchulski: Do you need some more tests? Or is above crash confirmation with log from @robimarko enough?

stefanchulski commented 3 years ago

@pali So issue related to cpufreq as described in the patch or do you have an issue with 1.2GHz?

pali commented 3 years ago

@stefanchulski seems that both. There is issue related to cpufreq as described on mailing list. And @robimarko has problems with 1.2GHz as described in post https://github.com/MarvellEmbeddedProcessors/linux-marvell/issues/20#issuecomment-925018939

stefanchulski commented 3 years ago

@pali All other frequencies stable? Its a specific board issue occurred on many boards?

pali commented 3 years ago

It is on many boards. Problem occurs when either running on L0 load (ie without divisor) or when switching from L1 load (uses div1) to L0.

pali commented 3 years ago

After lot of experiments we somehow workarounded this crash on 1GHz variant of A3720 with this commit: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d118ac2062b5b8331c8768ac81e016617e0996ee

But fix/workaround does not work for 1.2GHz variant of A3720 and as @robimarko wrote it still crashes.

stefanchulski commented 3 years ago

I not familiar with all these AVS configurations on A37XX. But hardcoded values look strange, should you take into account chip skew and calculate AVS from SVC?

robimarko commented 3 years ago

Yes, I only have 1.2GHz A3720 models and for me, all of the boards I tried are crashing. Only if I crank the minimal voltage up a lot do they tend to be stable, but it's way too pre-silicon sample-specific.

pali commented 3 years ago

But hardcoded values look strange

Yes, but we have absolutely no idea what is happening here. And if you look at referenced changed from above commit https://github.com/MarvellEmbeddedProcessors/linux-marvell/commit/dc33b62c90696afb6adc7dbcc4ebbd48bedec269 those hardcoded values were done by Marvell developers...

should you take into account chip skew and calculate AVS from SVC?

Probably, but we have no idea how... There is missing documentation about this topic. I have not seen any SVC documentation. So this is something which is probably only internally in Marvell.

erdoukki commented 3 years ago

@stefanchulski If I am seeing it correctly, it's using 1200MHz by default after booting as the kernel is not scaling it anymore.

Same for me...

SVC REV: 5, CPU VDD voltage: 1.237V
Model: gti cellular cpe board
       CPU     1200 [MHz]
       L2      1200 [MHz]
       NB AXI  300 [MHz]
       SB AXI  250 [MHz]
       DDR     750 [MHz]
[    2.155142] Unsupported CPU frequency 1200 MHz                                                                                                      
root@OpenWrt:/# uname -ar                                                                                                                                                                           
Linux OpenWrt 5.10.64 #0 SMP Sun Sep 26 07:10:17 2021 aarch64 GNU/Linux                                                                                                                             
root@OpenWrt:/# cat /sys/kernel/debug/clk/cpu/clk_rate                                                                                                 
1200000000                                                                                                                                             

This one had crash quickly... Will redo the stress test to post results !

erdoukki commented 3 years ago

another ULTRA

SVC REV: 5, CPU VDD voltage: 1.225V
[    2.228075] Unsupported CPU frequency 1200 MHz
root@OpenWrt:/# cat /sys/kernel/debug/clk/cpu/clk_rate                                                                                                                                              
1200000000                                                                                                                                                                                          
root@OpenWrt:~# uname -ar
Linux OpenWrt 5.4.143 #0 SMP Tue Aug 31 22:20:08 2021 aarch64 GNU/Linux
OPENWRT_RELEASE="OpenWrt 21.02.0 r16279-5cc0535800"
root@OpenWrt:/# stress --cpu 2 --io 2 --timeout 1h                                                                                                                                                  
stress: info: [2800] dispatching hogs: 2 cpu, 2 io, 0 vm, 0 hdd         
stress: info: [2800] successful run completed in 3600s                                                                                                                                                                                                                                                                          

UPDATE : OK

erdoukki commented 3 years ago

More from third ULTRA board :

root@ultra:~# uname -ar
Linux ultra 5.4.124 #0 SMP Sun Jun 13 22:02:19 2021 aarch64 GNU/Linux
TIM-1.0                                                                                                                                                                                             
mv_ddr-devel-g80be893d2b-d DDR4 16b 1GB 1CS                                                                                                                                                         
WTMI-devel-18.12.1-2efdb10f                                                                                                                                                                         
WTMI: system early-init                                                                                                                                                                             
SVC REV: 5, CPU VDD voltage: 1.097V                                                                                                                                                                 
Setting clocks: CPU 1000 MHz, DDR 800 MHz                                                                                                                                                           
CZ.NIC's Armada 3720 Secure Firmware v2021.04.09 (Aug  8 2021 14:26:28)                                                                                                                             
Running on ESPRESSObin Ultra                                                                                                                                                                        
OPENWRT_RELEASE="OpenWrt 21.02.0-rc3 r16172-2aba3e9784"

from lscpu :

CPU max MHz:                     1000.0000
CPU min MHz:                     200.0000

pretty stable :

root@ultra:~# stress --cpu 2 --io 2 --timeout 1h
stress: info: [14832] dispatching hogs: 2 cpu, 2 io, 0 vm, 0 hdd

UPDATE : OK

erdoukki commented 3 years ago

More also from my fourth ULTRA board :

root@ULTRA-5G:~# uname -ar
Linux ULTRA-5G 5.4.137 #0 SMP Sat Jul 31 17:21:01 2021 aarch64 GNU/Linux
OPENWRT_RELEASE="OpenWrt 21.02.0-rc4 r16256-2d5ee43dc6"
SVC REV: 5, CPU VDD voltage: 1.202V                                                                                                                                                                 

from lscpu

CPU max MHz:                     1200.0000
CPU min MHz:                     200.0000

pretty stable :

root@ULTRA-5G:~# stress --cpu 2 --io 2 --timeout 1h
stress: info: [22073] dispatching hogs: 2 cpu, 2 io, 0 vm, 0 hdd
stress: info: [22073] successful run completed in 3600s

UPDATE : OK

pali commented 3 years ago

Due to bugs in a37xx cpu driver, reported cpu frequency (e.g. by lscpu) could be incorrect. So the best check for (maximal) cpu frequency is to use mhz userspace tool from https://github.com/wtarreau/mhz which reports correct value, even when kernel reports it incorrectly.

robimarko commented 3 years ago

@pali It's running at 1200MHz as that is set by WTMI and since CPUFreq is blacklisted for the 1200MHz model kernel won't touch it.

root@OpenWrt:/# mhz
count=516515 us50=21364 us250=106829 diff=85465 cpu_MHz=1208.717

@erdoukki That's the issue that depending on the exact board you test some are stable with the WTMI set voltages while others are not, for me most of them will crash. And if CPUFreq is enabled then they will crash much faster, usually during the boot itself, so its a bug for sure.

pali commented 3 years ago

@robimarko thank you for confirmation. Now it is up to the Marvell and @stefanchulski to look at this and try to fix this issue.

robimarko commented 3 years ago

@stefanchulski Any updates?

stefanchulski commented 3 years ago
  1. On board that reproduces this issue could you please dump in uboot and Linux register: "md 0xd0011500". Please use Linux with disabled DFS.
  2. Issue occurred on ESPRESSObin board?
robimarko commented 3 years ago

Sure, here it is from U-boot.

Marvell>> md 0xd0011500
d0011500: 5a28ffff 02000257 00008000 800001e1    ..(ZW...........

Linux:

root@OpenWrt:/# devmem 0xd0011500
0x5A28FFFF

So they are the same

Yes, I am encountering it on the Espressobin Ultra boards, those are the only ones I have with 3720.

erdoukki commented 3 years ago

@pali If you disable DFS feature and boot with 1.2GHz frequency only, do you see any crashes?

@robimarko @pali how can I disable DFS feature ?

pali commented 3 years ago

DFS (AVS) is on A3720 disabled when armada-37xx-cpufreq driver is not initialized. And for 1.2 GHz mode it is already disabled if you see that Unsupported CPU frequency message.

robimarko commented 3 years ago

It's already disabled in the OpenWrt if you are running it as they have backported the 5.14 patch for it, but like @pali said just check the boot log for the print.

stefanchulski commented 3 years ago

Could you please set in uboot: "mw 0xd0011500 0x5CE8FFFF", check in Linux that this wasn't overwritten and run stress tests. Thanks.

robimarko commented 3 years ago

Its really unstable:

root@OpenWrt:/# stress-ng --matrix 0 -t 1m
stress-ng: info:  [2485] setting to a 60 second run per stressor
stress-ng: info:  [2485] dispatching hogs: 2 matrix
[  103.145154] ------------[ cut here ]------------
[  103.149947] Kernel BUG at do_undefinstr+0x27c/0x290 [verbose debug info unavailable]
[  103.157936] Internal error: Oops - BUG: 0 [#1] SMP
[  103.162875] Modules linked in: pppoe ppp_async iptable_nat xt_state xt_nat xt_conntrack xt_REDIRECT xt_MASQUERADE xt_FLOWOFFLOAD pppox ppp_generic nf_nat nf_flow_table nf_conntrack ipt_REJECT xt_time xt_tcpudp xt_multiport xt_mark xt_mac xt_limit xt_comment xt_TCPMSS xt_LOG slhc rtc_pcf8563 nf_reject_ipv4 nf_log_g
[  103.223248] CPU: 0 PID: 2487 Comm: stress-ng Not tainted 5.10.64 #0
[  103.229708] Hardware name: Globalscale Marvell ESPRESSOBin Ultra Board (DT)
[  103.236889] pstate: 00400085 (nzcv daIf +PAN -UAO -TCO BTYPE=--)
[  103.243086] pc : do_undefinstr+0x27c/0x290
[  103.247308] lr : do_undefinstr+0x58/0x290
[  103.251438] sp : ffffffc010b5bb80
[  103.254850] x29: ffffffc010b5bb80 x28: ffffff8000f16780 
[  103.260329] x27: 0000000000000060 x26: 0000001801dd263b 
[  103.265807] x25: ffffff803fdcc860 x24: ffffffc010a389a8 
[  103.271283] x23: 0000000080400085 x22: ffffffc010054b00 
[  103.276760] x21: 00000000009b4000 x20: 00000000b9418460 
[  103.282237] x19: ffffffc010b5bbf0 x18: 0000000000000000 
[  103.287713] x17: 0000000000000000 x16: 0000000000000000 
[  103.293190] x15: 0000000000000000 x14: 0000000000000000 
[  103.298668] x13: 000000000000033a x12: 0000000000000040 
[  103.304145] x11: ffffff8001100248 x10: 0000000000000037 
[  103.309622] x9 : 0000000000000001 x8 : 0000000002e94e50 
[  103.315098] x7 : 0000000000000037 x6 : ffffffc010b5bbc0 
[  103.320574] x5 : 0000000000000000 x4 : ffffffc010a3fd20 
[  103.326052] x3 : 00000000d5300000 x2 : 00000000b9400000 
[  103.331529] x1 : ffffffc010b0b0e8 x0 : 0000000080400085 
[  103.337007] Call trace:
[  103.339529]  do_undefinstr+0x27c/0x290
[  103.343395]  el1_undef+0x2c/0x4c
[  103.346718]  el1_sync_handler+0x8c/0xd0
[  103.350669]  el1_sync+0x88/0x140
[  103.353997]  check_preempt_wakeup+0x30/0x1c0
[  103.358396]  check_preempt_curr+0x7c/0x8c
[  103.362527]  ttwu_do_wakeup.constprop.0+0x1c/0x90
[  103.367374]  ttwu_do_activate.isra.0+0xac/0xc0
[  103.371952]  try_to_wake_up+0x1e4/0x390
[  103.375903]  wake_up_process+0x18/0x24
[  103.379766]  hrtimer_wakeup+0x20/0x40
[  103.383538]  __hrtimer_run_queues+0x11c/0x234
[  103.388027]  hrtimer_interrupt+0x114/0x2f0
[  103.392252]  arch_timer_handler_phys+0x34/0x44
[  103.396831]  handle_percpu_devid_irq+0x84/0x150
[  103.401501]  __handle_domain_irq+0x7c/0xe0
[  103.405725]  gic_handle_irq+0x88/0x150
[  103.409587]  el0_irq_naked+0x4c/0x54
[  103.413278] Code: d5033fdf d51b4220 17ffffcf a9025bf5 (d4210000) 
[  103.419562] ---[ end trace bfb2c4d1d1f2da4b ]---
[  103.424322] Kernel panic - not syncing: Oops - BUG: Fatal exception in interrupt
[  103.431951] SMP: stopping secondary CPUs
[  103.435998] Kernel Offset: disabled
[  103.439592] CPU features: 0x0000002,00002008
[  103.443990] Memory Limit: none
[  103.447138] Rebooting in 3 seconds..

Took it like 20-30 seconds to crash

stefanchulski commented 3 years ago

Can you try 0x5CF3FFFF?

pali commented 3 years ago

According to https://github.com/MarvellEmbeddedProcessors/A3700-utils-marvell/blob/master/wtmi/sys_init/avs.c register 0xd0011500 controls AVS voltage. Bits [21:16] encodes high vdd limit and bits [27:22] encodes low vdd limits. Above avs.c sets both high and low to same value.

New value 0x5CE8FFFF sets 0x33 (1.342V) as low limit and 0x28 (1.213V) as high limit. Is this value correct? Looks strange if high limit is lower than low limit.

erdoukki commented 3 years ago

EspressoBin Ultra with 1.2 GHz

SVC REV: 5, CPU VDD voltage: 1.237V

Default value of 0xd0011500 = 5aaaffff

DFS disabled :

root@OpenWrt:/# dmesg | grep CPU
[    2.116404] Unsupported CPU frequency 1200 MHz

Using OpenWrt 21.02.0

root@OpenWrt:/# uname -ar
Linux OpenWrt 5.4.143 #0 SMP Tue Aug 31 22:20:08 2021 aarch64 GNU/Linux

Crash in less than a 1 minute with : stress-ng --matrix 0 -t 1m when setting these values to 0xd0011500 :

UPDATE : add mhz informations

root@OpenWrt:/mhz# ./mhz                                                        
count=516515 us50=21529 us250=107777 diff=86248 cpu_MHz=1197.744                
stefanchulski commented 3 years ago

Can we try one more: 0x5BAEFFFF To make sure: 1 GHz is stable?

erdoukki commented 3 years ago
Marvell>> mw 0xd0011500 0x5BAEFFFF                                              
Marvell>> md 0xd0011500                                                         
d0011500: 5baeffff 02000257 00008000 800001e1    ...[W...........
root@OpenWrt:/# stress-ng --matrix 0 -t 1m                                      
stress-ng: info:  [2070] dispatching hogs: 2 matrix                             
stress-ng: info:  [2070] successful run completed in 60.00s (1 min, 0.00 secs)  

Can we try one more: 0x5BAEFFFF To make sure: 1 GHz is stable?

Stress give no crash !

UPDATE : a longer time test crash also !

erdoukki commented 3 years ago

I have another EspressoBin-Ultra board which looks like more stable at 1.2GHz !

SVC REV: 5, CPU VDD voltage: 1.225V
Marvell>> md 0xd0011500                                                         
d0011500: 5a69ffff 02000257 00008000 800001e1    ..iZW...........
root@OpenWrt:/# dmesg | grep CPU
[    2.116404] Unsupported CPU frequency 1200 MHz

OpenWrt 21.02.0, r16279-5cc0535800

root@OpenWrt:/# uname -ar
Linux OpenWrt 5.4.143 #0 SMP Tue Aug 31 22:20:08 2021 aarch64 GNU/Linux
root@OpenWrt:/# devmem 0xd0011500                                                                                           
0x5A69FFFF
root@OpenWrt:/# stress-ng --matrix 0 -t 1m                                      
stress-ng: info:  [3131] dispatching hogs: 2 matrix                             
stress-ng: info:  [3131] successful run completed in 60.00s (1 min, 0.00 secs)  
Marvell>> mw 0xd0011500 0x5CE8FFFF
root@OpenWrt:/# devmem 0xd0011500
0x5CE8FFFF
root@OpenWrt:/# stress-ng --matrix 0 -t 1m                                      
stress-ng: info:  [2291] dispatching hogs: 2 matrix                             
stress-ng: info:  [2291] successful run completed in 257325.64s (2 days, 23 hou)

UPDATE : stressing with longer timeout...

root@OpenWrt:/# stress-ng --matrix 0 -t 10m                                     
stress-ng: info:  [2441] dispatching hogs: 2 matrix                             
stress-ng: info:  [2441] successful run completed in 600.00s (10 mins, 0.00 sec)

UPDATE : adding mhz informations...

root@OpenWrt:/mhz# ./mhz                                                        
count=516515 us50=21531 us250=107688 diff=86157 cpu_MHz=1199.009                

UPDATE : devmem from linux added !

robimarko commented 3 years ago

@erdoukki Leave it running for longer, this is just a 1-minute test. I will give it a go soon.

As far as 1GHz models go, I don't have any.

erdoukki commented 3 years ago

@erdoukki Leave it running for longer, this is just a 1-minute test. I will give it a go soon.

As far as 1GHz models go, I don't have any.

You were right...

Can we try one more: 0x5BAEFFFF To make sure: 1 GHz is stable?

with 10 minutes test, crash at less than 2 !!!

robimarko commented 3 years ago

@stefanchulski Anything else we can try as I really need this to be resolved, even if I have to update the firmware on all of the deployed boards

erdoukki commented 3 years ago

@stefanchulski Anything else we can try as I really need this to be resolved, even if I have to update the firmware on all of the deployed boards

Same, I have a project which is suspended because of this particular issue !

erdoukki commented 3 years ago

@marvell ! Just made a new attempt at the support : here is the sent message :

Héllo,

Still no news of my message of requesting support !

I hope you'll take a look at : https://github.com/MarvellEmbeddedProcessors/linux-marvell/issues/20

And you'll read some comments at : https://www.cnx-software.com/2021/10/03/mochabin-5g-openwrt-ubuntu-sbc-10gbe-wifi-6-5g/

The missing of answer made Marvell being really a "bad guy" from the geek/hacker community ! Such good products and so bad support that make some simple bugs being a never buy again advice all around...

Hope you'll get this as important that it is !

From a long fan in the Open Source Community, have a nice day. Best regards, Gérald Kerma CyberMind.FR GANDALF(at)GK2(dot)NET

I think this will stay silent, again !

UPDATE : sorry for making some noise on this issue, but it is particularly in the subject...of this still unresolved bug !

erdoukki commented 3 years ago

@pali @stefanchulski @robimarko

According to https://github.com/MarvellEmbeddedProcessors/A3700-utils-marvell/blob/master/wtmi/sys_init/avs.c register 0xd0011500 controls AVS voltage. Bits [21:16] encodes high vdd limit and bits [27:22] encodes low vdd limits. Above avs.c sets both high and low to same value.

How can I calculate some values to test for the register 0xd0011500 ?

I do not read the same min / max values in the Armada 3720 Hardware specification !

Thanks all in advance...