foss-for-synopsys-dwc-arc-processors / linux

Helpful resources for users & developers of Linux kernel for ARC
22 stars 13 forks source link

perf statistical profiling (perf record/report) broken on upstream v5.9 kernel (HSDK-4xD) #30

Closed vineetgarc closed 3 years ago

vineetgarc commented 3 years ago

This came up in recent benchmarking activity. In mainline kernels statistical profiling or perf record is broken

/home/root>perf record ./hackbench
Couldn't synthesize bpf events.
Couldn't synthesize cgroup events.
Running with 10 groups 400 process
Time: 0.977
[ perf record: Woken up 5 times to write data ]
[ perf record: Captured and wrote 0.038 MB perf.data ]

/home/root>perf report -n --stdio
Error:
The perf.data data has no samples!
# To display the perf.data header info, please use --header/--header-only options.
#

At first glance it seemed like my recent commits to enable perf on HSDK-4xD merged upstream were missing, specifically 2020-07-09 fe81d927b78c ARC: HSDK: wireup perf irq
2020-07-26 feb92d7d3813 ARC: perf: don't bail setup if pct irq missing in device-tree

However this is v5.9 which already has my changes. Initial investigation reveals pct IRQ not getting enabled hence the issue with statistical profiling

/home/root>uname -a
Linux hsdk 5.9.0-00002-g0d2dadadb615-dirty #90 SMP PREEMPT Wed Oct 21 16:38:10 PDT 2020 arc  arc arc GNU/Linux

/home/root>cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3
  6:        325        325        325        325  MCIP IDU Intc   6  ttyS0
 10:          3          3          2          2  MCIP IDU Intc  10  eth0
 12:       1202       1202       1201       1201  MCIP IDU Intc  12  dw-mci
 15:          6          6          5          5  MCIP IDU Intc  15  ehci_hcd:usb1, ohci_hcd:usb2
 16:      20850      37391       3050      11074  ARCv2 core Intc  16  Timer0 (per-cpu-tick)
 17:         33         33         33         32  MCIP IDU Intc  16  f0020000.spi
 19:      28147      10479      15783      12815  ARCv2 core Intc  19  IPI Interrupt
 21:          0          0          0          0  ARCv2 core Intc  21  IPI Interrupt
 88:          0          0          0          0  MCIP IDU Intc  28  f0090000.gpu
 89:          0          0          0          0  MCIP IDU Intc  27  dw_axi_dmac_platform

Added a diagnostic patch (slated for mainline after merge window) 66f825cc25fbfc8, "(ARC: perf: check if pct irq request fails )" which at least shows the root cause.

/home/root>perf record -c 1000
Error:
cycles: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat'

But I don't know why this is and thus needs to be debugged.

vineetgarc commented 3 years ago

This turned out to be much more mundane and silly than expected.

First of all it was a regression caused by the prior fix for exact same issue, merged in v4.9-rc4, commit feb92d7d3813456c11dce21 "(ARC: perf: don't bail setup if pct irq missing in device-tree)"

Second it was the most basic "C" coding bug almost a newbie to "C" would code. The assignment and comparison in an if statement were not bracketed correctly, leaving the order of evaluation undefined.

 if (has_interrupts && (irq = platform_get_irq(pdev, 0) >= 0)) {
                          ^^^                          ^^^^

And given such a chance, the compiler will bite you hard, fully entitled to generating this piece of beauty:


 bl @platform_get_irq  <-- irq returned in r0
 setge r2, r0, 0       <-- r2 is bool 1 or 0 depending on irq >= 0 true/false
 brlt.d r0, 0, @.L114

 st_s   r2,[sp]     <-- irq saved is bool 1 or 0, not actual return val [NOK]
 st 1,[r3,160]      # arc_pmu.18_29->irq <-- drops bool and assumes 1 [NOK]

 bl.d @__request_percpu_irq;
 mov_s  r0,1       <-- drops even bool and assumes 1 which fails [NOK]

With the snafu fixed, everything is as expected.

 bl @platform_get_irq   <-- returns irq in r0
 mov_s  r2,r0
 brlt.d r2, 0, @.L112
 st_s   r0,[sp]         <-- irq isaved is actual return value above
 st r0,[r13,160]    #arc_pmu.18_27->irq

 bl.d @__request_percpu_irq <-- r0 unchanged so actual irq returned
 add r4,r4,r12  #, tmp363, __ptr

Fix effectively is

- if (has_interrupts && (irq = platform_get_irq(pdev, 0) >= 0)) {
+ if (has_interrupts && ((irq = platform_get_irq(pdev, 0)) >= 0)) {

But the real commit reworks the code a bit more to simplify setting PERF_PMU_CAP_NO_INTERRUPT

Pull request with fix for 5.10-rc1 sent to Linus

vineetgarc commented 3 years ago

Fix merged upstream: 8c42a5c02bec6c ARC: perf: redo the pct irq missing in device-tree handling