apache / nuttx

Apache NuttX is a mature, real-time embedded operating system (RTOS)
https://nuttx.apache.org/
Apache License 2.0
2.79k stars 1.16k forks source link

RISC-V:the FPU case in ostest failed due to "lazy" FPU save/restore? #6172

Open hotislandn opened 2 years ago

hotislandn commented 2 years ago

We tried to port NuttX on a new SoC, and found an issue related to the FPU test.

In the source files: https://github.com/apache/incubator-nuttx/blob/master/arch/risc-v/src/common/riscv_exception_common.S https://github.com/apache/incubator-nuttx/blob/master/arch/risc-v/src/common/riscv_macros.S

We can see that when the FPU context needs to be saved, it checks the FS field, and stores the FPU regs ONLY when it is dirty, then mark it as clean and updates "mstatus" in the full context.

In the FPU test: https://github.com/apache/incubator-nuttx-apps/blob/master/testing/ostest/fpu.c

  1. when up_saveusercontext is called for the first time, the FS=3 after float OPs in this thread, so we got the FPU context stored in sp, and copied to "fpu->save1", when done, FS=2 in this thread
  2. when up_saveusercontext is called for the second time, just after the first call, the "LAZY" FPU save logic will skip FPU context since FS = 2(CLEAN) != 3(DIRTY). But since the FPU context is still in the stack of the current thread, so we still get the right float regs in fpu->save2, and pass the first up_fpucmp
  3. when up_saveusercontext is called for the third time, the FS is still 2 since we did not touch the FPU during sched_unlock and usleep. However, these two did mess up the stack --- the original FPU context is gone. But due to the "LAZY" FPU save logic, it still skip the FPU context since FS=2. So we will get the un-expected data in fpu->save2 this time, and leads to fail in the second up_fpucmp.

Did we missed something in our port? Thanks in advance.

xiaoxiang781216 commented 2 years ago

We tried to port NuttX on a new SoC, and found an issue related to the FPU test.

In the source files: https://github.com/apache/incubator-nuttx/blob/master/arch/risc-v/src/common/riscv_exception_common.S https://github.com/apache/incubator-nuttx/blob/master/arch/risc-v/src/common/riscv_macros.S

We can see that when the FPU context needs to be saved, it checks the FS field, and stores the FPU regs ONLY when it is dirty, then mark it as clean and updates "mstatus" in the full context.

In the FPU test: https://github.com/apache/incubator-nuttx-apps/blob/master/testing/ostest/fpu.c

  1. when up_saveusercontext is called for the first time, the FS=3 after float OPs in this thread, so we got the FPU context stored in sp, and copied to "fpu->save1", when done, FS=2 in this thread
  2. when up_saveusercontext is called for the second time, just after the first call, the "LAZY" FPU save logic will skip FPU context since FS = 2(CLEAN) != 3(DIRTY). But since the FPU context is still in the stack of the current thread, so we still get the right float regs in fpu->save2, and pass the first up_fpucmp
  3. when up_saveusercontext is called for the third time, the FS is still 2 since we did not touch the FPU during sched_unlock and usleep. However, these two did mess up the stack --- the original FPU context is gone. But due to the "LAZY" FPU save logic, it still skip the FPU context since FS=2. So we will get the un-expected data in fpu->save2 this time, and leads to fail in the second up_fpucmp.

Did we missed something in our port? Thanks in advance.

One method is always save all FPU register in up_saveusercontext, another method is do some read only float operation between each test to overcome the lazy fpu saving optimization. The later approach looks simple and less invasive.

hotislandn commented 2 years ago

Yes, the 2nd way seems enough for this case. e.g., do a float to int conversion, or store a float element to mem, after checking the code generation and evaluating the impaction to all platforms. Another option is to add a new API say up_savefullcontext?

hotislandn commented 2 years ago

we will give it a try to move things like "fpu->sp1 = sp1" to the new positon: after usleep().

masayuki2009 commented 2 years ago

@hotislandn

I'm not sure if your issue relates to the following issue. https://github.com/apache/incubator-nuttx/pull/6134#issuecomment-1109173828

Ouss4 commented 2 years ago

Since the FPU state is stored in the interrupt stack frame, isn't the issue more generic with lazy context switching?

hotislandn commented 2 years ago

Since the FPU state is stored in the interrupt stack frame, isn't the issue more generic with lazy context switching?

In fact, the context switch during the usleep() call does NOT save the FPU regs in TCB since FS is 2 at that time.

hotislandn commented 2 years ago

@hotislandn

I'm not sure if your issue relates to the following issue. #6134 (comment)

The root cause is different. However, the suggestion of a new API to get context should work for this. BTW, on RISC-V, we may meet a similar issue after we introduce the RVV into Nuttx.

Ouss4 commented 2 years ago

In fact, the context switch during the usleep() call does NOT save the FPU regs in TCB since FS is 2 at that time.

Now we only have a reference in the TCB for the save area in the stack.
When a task resumes, SP will point to its original pre-interrupt location making the whole interrupt stack frame available.

hotislandn commented 2 years ago

In fact, the context switch during the usleep() call does NOT save the FPU regs in TCB since FS is 2 at that time.

Now we only have a reference in the TCB for the save area in the stack. When a task resumes, SP will point to its original pre-interrupt location making the whole interrupt stack frame available.

Yes, that's the normal case, tcb -> xcp -> regs points to the right context.

However, in this FPU test, on RISC-V, the lazy FPU optimization causes FS to change to 2 in up_saveusercontext(). When the real context switch happens in usleep(), FPU regs save will be skipped.

Ouss4 commented 2 years ago

I was talking about the current layout in general and not particularly about the FPU test. I believe that since we are lazily saving/restoring FPU state, it should not be part of the interrupt stack frame and rather have a different, fixed, area, like in the TCB or at the bottom of the stack along side the TLS region and arguments.

xiaoxiang781216 commented 2 years ago

I was talking about the current layout in general and not particularly about the FPU test. I believe that since we are lazily saving/restoring FPU state, it should not be part of the interrupt stack frame

FPU state isn't saved to the interrupt stack, but the current thread stack.

and rather have a different, fixed, area, like in the TCB or at the bottom of the stack along side the TLS region and arguments.

It is no real different to save FPU state in TCB, bottom of the stack or the near of the integer register context. But, the current layout could be more:

  1. Simple to resolve the context space in case of signal dispatch
  2. Generate more efficient save/restore code(hardware normally has better support for SP related load/save)
  3. It's hard to access TCB field or the bottom of the stack in assembly code without the help from some c utility function
Ouss4 commented 2 years ago

I was talking about the current layout in general and not particularly about the FPU test. I believe that since we are lazily saving/restoring FPU state, it should not be part of the interrupt stack frame

FPU state isn't saved to the interrupt stack, but the current thread stack.

Yes. I didn't mean interrupt stack, I meant interrupt saving area (or registers saving area) in the current thread's stack.