foss-for-synopsys-dwc-arc-processors / linux

Helpful resources for users & developers of Linux kernel for ARC
22 stars 13 forks source link

glibc/hard-float: math/test-fenv-tls sporadic failure #54

Closed vineetgarc closed 2 years ago

vineetgarc commented 3 years ago

When testing arc64 glibc hf support (on nSIM/FPGA) I observed sporadic failure of math/test-fenv-tls

cat build/glibc-arc64/build/math/test-fenv-tls.out
FE_INVALID not raised
exceptions not all cleared
FE_INEXACT not raised

It seems the issue was always FP exceptions related (test also checks rounding modes etc which works fine). So created a simple test.

#define TEST_ONE_RAISE(EX)              \
  do                            \
    {                           \
      if (feraiseexcept (EX) == 0)          \
    if (fetestexcept (EX) != EX)            \
      {                     \
        printf (#EX " not raised #%d %s\n", i, p);      \
        ret = 1;                    \
      }                     \
      if (feclearexcept (FE_ALL_EXCEPT) == 0)       \
    if (fetestexcept (FE_ALL_EXCEPT) != 0)      \
      {                     \
        printf ("exceptions not all cleared #%d %s\n", i, p);   \
        ret = 1;                    \
      }                     \
    }                           \
  while (0)

static void * test_raise (void *arg)
{
  intptr_t ret = 0;
  char *p = arg;

  for (int i = 0; i < 10000; i++)
    {
      TEST_ONE_RAISE (FE_INEXACT);
    }
  return (void *) ret;
}

int main(void)
{
   int ret = 0;
   void *vret;
   vret = test_raise("direct");
   ret |= (intptr_t) vret;
   return ret;
}

This test would always passes.

Next as in original test, adding a thread doing this (in addition to main thread) and was able to hit the problem.

int main(void)
{
   int ret = 0;
   void *vret;
+ int pret;
+ pthread_t thread_id;
+ pret = pthread_create (&thread_id, NULL, test_raise, "pthread");
   vret = test_raise("direct");
   ret |= (intptr_t) vret;
+ pret = pthread_join (thread_id, &vret);
   return ret;
}

The issue turned out to be FPU_STATUS register which is poked by feraiseexcept / feclearexcept / fetestexcept was not retaining the right value across Linux task switch. This is despite CONFIG_ARC_FPU_SAVE_RESTORE=y in kernel.

vineetgarc commented 3 years ago

It seems this can happen on ARCv2 too, if rarely: at least one of my HSDK run logs had this FAIL'ing. On ARCv2 FPGA it could be easily triggered when ran in a loop.

# while true; do fp-raise-arcv2; done
FE_INEXACT not raised #5564 direct
exceptions not all cleared #7821 direct
FE_INEXACT not raised #8086 pthread
vineetgarc commented 3 years ago

So the issue is how FPU_STATUS is programed to raise/clear exceptions. Explicit setting of Exception Flag bits [0] thru [4] requires setting FWE [31]. This is what the glibc macros do

/* [ 0] -> IV: flag invalid operation.
   [ 1] -> DZ: flag division by zero.
   [ 2] -> OV: flag Overflow operation.
   [ 3] -> UV: flag Underflow operation.
   [ 4] -> IX: flag Inexact operation.
   [31] -> FWE: Flag Write Enable.
           If 1, above flags writable explicitly (clearing),
           else IoW and only writable indirectly via bits [12:7]. */

#  define _FPU_SETS(cw)             \
    do {                    \
      unsigned int __tmp = 0x80000000 | (cw);   \       <-----
      __asm__ volatile ("sr  %0, [0x301] \r\n"  \
                        : : "r" (__tmp));   \
    } while (0)

int
__feraiseexcept (int excepts)
{
  unsigned int fpsr;
  _FPU_GETS (fpsr);  fpsr |= excepts;  _FPU_SETS (fpsr);
  return 0;
}

However in case of a task switch Linux kernel also needs to switch the FPU_STATUS to incoming task's bits. That code doesn't wiggle the FWE bit like above and thus can fail to effect the update. Here's the excerpt from nSIM trace.

# kernel init FPU_STATUS with FWE
[0x818df868] 0x26ab704c 0x80000000   AD K  Z    sr             80000000,0x301: aux[0x301] <= 0x80000000 *

# glibc/userspace reading FPU_STATUS not observing FWE
[0x200287a4] 0x22aa004c              AD U  Z C  lr             r2,[0x301] : (w0) r2 <= 0x00000000: aux[0x301] => 0x00 *
[0x200287a8] 0x225007c2              AD U  Z C  bclr           r2,r2,0x1f : (w0) r2 <= 0x00000000 *
[0x200287ac] 0x7845                  AD U  Z C  or_s           r0,r0,r2 : (w0) r0 <= 0x00000004 *

# glibc preventively OR'ing FWE before every write to FPU_STATUS
[0x200287ae] 0xb89f                  AD U  Z C  bset_s         r0,r0,0x1f : (w0) r0 <= 0x80000004 *
[0x200287b0] 0x20ab004c              AD U  Z C  sr             r0,0x301: aux[0x301] <= 0x80000004 *

# kernel context switch code save/restoring FPU_STATUS (failing to OR FWE: root-cause
[0x818df87c] 0x22aa004c              AD K  Z    lr             r2,[0x301] : (w0) r2 <= 0x00000004: aux[0x301] => 0x04 *
[0x818df880] 0x18ff0098              AD K  Z    st.as          r2,[r0,0xff] : sw [0x9f03fbfc] <= 0x00000004 *
[0x818df88c] 0x11ff0602              AD K  Z    ld.as          r2,[r1,0xff] : lw [0x9f19a3fc] => 0x00000000 : (w1) r2 <= 0x00000000 *
[0x818df890] 0x22ab004c              AD K  Z    sr             r2,0x301: aux[0x301] <= 0x00 *

# BUG manifests: user-space not observing switched-to task's FPU_STATUS (expected 0x00, but sees stale 0x04)
[0x200287d4] 0x22aa004c              AD U  Z    lr             r2,[0x301] : (w0) r2 <= 0x00000004: aux[0x301] => 0x04 *
vineetgarc commented 3 years ago

When kernel starts a task fpu_init_task() sets the FWE bit once assuming that it is retained. glibc cautiously sets FWE everytime it updates FPU_STATUS so kernel need not do anything special on context switch. However it seems FWE is clear-on-write thus kernel needs to set it even on context switch code.

void fpu_save_restore(struct task_struct *prev, struct task_struct *next)
 {
        struct arc_fpu *save = &prev->thread.fpu;
        struct arc_fpu *restore = &next->thread.fpu;
+       const unsigned int fwe  = 0x80000000;

        save->ctrl = read_aux_reg(ARC_REG_FPU_CTRL);
        save->status = read_aux_reg(ARC_REG_FPU_STATUS);

        write_aux_reg(ARC_REG_FPU_CTRL, restore->ctrl);
-       write_aux_reg(ARC_REG_FPU_STATUS, restore->status);
+       write_aux_reg(ARC_REG_FPU_STATUS, (fwe | restore->status));
vineetgarc commented 3 years ago

Local testing confirms the fix. Waiting on confirmation from hw-folks before sending patch out to lkml.

vineetgarc commented 3 years ago

Fix posted to lkml: http://lists.infradead.org/pipermail/linux-snps-arc/2021-July/005247.html Ensuing glibc optmization: http://lists.infradead.org/pipermail/linux-snps-arc/2021-July/005252.html

vineetgarc commented 2 years ago

Fix merged in mainline 2021-07-08 3a715e80400f ARC: fp: set FPU_STATUS.FWE to enable FPU_STATUS update on context switch