Closed vineetgarc closed 2 years ago
It seems this can happen on ARCv2 too, if rarely: at least one of my HSDK run logs had this FAIL'ing. On ARCv2 FPGA it could be easily triggered when ran in a loop.
# while true; do fp-raise-arcv2; done
FE_INEXACT not raised #5564 direct
exceptions not all cleared #7821 direct
FE_INEXACT not raised #8086 pthread
So the issue is how FPU_STATUS is programed to raise/clear exceptions. Explicit setting of Exception Flag bits [0] thru [4] requires setting FWE [31]. This is what the glibc macros do
/* [ 0] -> IV: flag invalid operation.
[ 1] -> DZ: flag division by zero.
[ 2] -> OV: flag Overflow operation.
[ 3] -> UV: flag Underflow operation.
[ 4] -> IX: flag Inexact operation.
[31] -> FWE: Flag Write Enable.
If 1, above flags writable explicitly (clearing),
else IoW and only writable indirectly via bits [12:7]. */
# define _FPU_SETS(cw) \
do { \
unsigned int __tmp = 0x80000000 | (cw); \ <-----
__asm__ volatile ("sr %0, [0x301] \r\n" \
: : "r" (__tmp)); \
} while (0)
int
__feraiseexcept (int excepts)
{
unsigned int fpsr;
_FPU_GETS (fpsr); fpsr |= excepts; _FPU_SETS (fpsr);
return 0;
}
However in case of a task switch Linux kernel also needs to switch the FPU_STATUS to incoming task's bits. That code doesn't wiggle the FWE bit like above and thus can fail to effect the update. Here's the excerpt from nSIM trace.
# kernel init FPU_STATUS with FWE
[0x818df868] 0x26ab704c 0x80000000 AD K Z sr 80000000,0x301: aux[0x301] <= 0x80000000 *
# glibc/userspace reading FPU_STATUS not observing FWE
[0x200287a4] 0x22aa004c AD U Z C lr r2,[0x301] : (w0) r2 <= 0x00000000: aux[0x301] => 0x00 *
[0x200287a8] 0x225007c2 AD U Z C bclr r2,r2,0x1f : (w0) r2 <= 0x00000000 *
[0x200287ac] 0x7845 AD U Z C or_s r0,r0,r2 : (w0) r0 <= 0x00000004 *
# glibc preventively OR'ing FWE before every write to FPU_STATUS
[0x200287ae] 0xb89f AD U Z C bset_s r0,r0,0x1f : (w0) r0 <= 0x80000004 *
[0x200287b0] 0x20ab004c AD U Z C sr r0,0x301: aux[0x301] <= 0x80000004 *
# kernel context switch code save/restoring FPU_STATUS (failing to OR FWE: root-cause
[0x818df87c] 0x22aa004c AD K Z lr r2,[0x301] : (w0) r2 <= 0x00000004: aux[0x301] => 0x04 *
[0x818df880] 0x18ff0098 AD K Z st.as r2,[r0,0xff] : sw [0x9f03fbfc] <= 0x00000004 *
[0x818df88c] 0x11ff0602 AD K Z ld.as r2,[r1,0xff] : lw [0x9f19a3fc] => 0x00000000 : (w1) r2 <= 0x00000000 *
[0x818df890] 0x22ab004c AD K Z sr r2,0x301: aux[0x301] <= 0x00 *
# BUG manifests: user-space not observing switched-to task's FPU_STATUS (expected 0x00, but sees stale 0x04)
[0x200287d4] 0x22aa004c AD U Z lr r2,[0x301] : (w0) r2 <= 0x00000004: aux[0x301] => 0x04 *
When kernel starts a task fpu_init_task() sets the FWE bit once assuming that it is retained. glibc cautiously sets FWE everytime it updates FPU_STATUS so kernel need not do anything special on context switch. However it seems FWE is clear-on-write thus kernel needs to set it even on context switch code.
void fpu_save_restore(struct task_struct *prev, struct task_struct *next)
{
struct arc_fpu *save = &prev->thread.fpu;
struct arc_fpu *restore = &next->thread.fpu;
+ const unsigned int fwe = 0x80000000;
save->ctrl = read_aux_reg(ARC_REG_FPU_CTRL);
save->status = read_aux_reg(ARC_REG_FPU_STATUS);
write_aux_reg(ARC_REG_FPU_CTRL, restore->ctrl);
- write_aux_reg(ARC_REG_FPU_STATUS, restore->status);
+ write_aux_reg(ARC_REG_FPU_STATUS, (fwe | restore->status));
Local testing confirms the fix. Waiting on confirmation from hw-folks before sending patch out to lkml.
Fix posted to lkml: http://lists.infradead.org/pipermail/linux-snps-arc/2021-July/005247.html Ensuing glibc optmization: http://lists.infradead.org/pipermail/linux-snps-arc/2021-July/005252.html
Fix merged in mainline 2021-07-08 3a715e80400f ARC: fp: set FPU_STATUS.FWE to enable FPU_STATUS update on context switch
When testing arc64 glibc hf support (on nSIM/FPGA) I observed sporadic failure of math/test-fenv-tls
It seems the issue was always FP exceptions related (test also checks rounding modes etc which works fine). So created a simple test.
This test would always passes.
Next as in original test, adding a thread doing this (in addition to main thread) and was able to hit the problem.
The issue turned out to be FPU_STATUS register which is poked by feraiseexcept / feclearexcept / fetestexcept was not retaining the right value across Linux task switch. This is despite CONFIG_ARC_FPU_SAVE_RESTORE=y in kernel.