apache / nuttx

Apache NuttX is a mature, real-time embedded operating system (RTOS)
https://nuttx.apache.org/
Apache License 2.0
2.83k stars 1.17k forks source link

arm64 smp hangs on the sched_unlock call in nx_start function #11915

Closed github-xiaodong closed 6 months ago

github-xiaodong commented 7 months ago

For arm64 smp, in nx_start function, after nx_smp_start and nx_bringup execute, the cpu will hang on up_testset when they call sched_unlock.

CONFIG_SMP=y
CONFIG_SMP_NCPUS=2

In nx_start function, if change g_idletcb[i].cmn.affinity = (cpu_set_t)(CONFIG_SMP_DEFAULT_CPUSET & SCHED_ALL_CPUS) to g_idletcb[i].cmn.affinity = (cpu_set_t)(1<< i); then smp can run normally.

I want to ask whether the value of g_idletcb[i].cmn.affinity should be the bit of the current core or the bit of all cores. When the comment says No, this IDLE thread can only run on its assigned CPU.

#ifdef CONFIG_SMP
      g_idletcb[i].cmn.flags = (TCB_FLAG_TTYPE_KERNEL | TCB_FLAG_CPU_LOCKED);
      g_idletcb[i].cmn.cpu   = i;

      /* Set the affinity mask to allow the thread to run on all CPUs.  No,
       * this IDLE thread can only run on its assigned CPU.  That is
       * enforced by the TCB_FLAG_CPU_LOCKED which overrides the affinity
       * mask.  This is essential because all tasks inherit the affinity
       * mask from their parent and, ultimately, the parent of all tasks is
       * the IDLE task.
       */

      g_idletcb[i].cmn.affinity =
        (cpu_set_t)(CONFIG_SMP_DEFAULT_CPUSET & SCHED_ALL_CPUS);
#else
      g_idletcb[i].cmn.flags = TCB_FLAG_TTYPE_KERNEL;
#endif

The arm64 up is normal. If enable smp and choose CONFIG_SMP_NCPUS =1 , is normal too.

acassis commented 7 months ago

@hnwxd since this code is generic for all SMP chips, I think it is fine. What board are you using?

@xiaoxiang781216 @anchao did you find this issue on your side?

github-xiaodong commented 7 months ago
  1. We are adapting our own A53 quad-core chip, and at present, imx8 and rk3399 under arm64 do not support smp. Is there any other board that actually runs smp under arm64 ?

  2. When I made the changes described above, all four cores started properly, and the smp call test in the ostest application passed correctly.

  3. By the way, I am developing on version releases/12.4

微信截图_20240315092718 微信截图_20240315092736 微信截图_20240315092817

xiaoxiang781216 commented 7 months ago
  1. We are adapting our own A53 quad-core chip, and at present, imx8 and rk3399 under arm64 do not support smp. Is there any other board that actually runs smp under arm64 ?

qemu and fvp support arm64, you can compare the difference with your hardware. With your change, all threads but idle will pin to CPU0, which may the reason why you pass the test.

github-xiaodong commented 7 months ago
  1. We are adapting our own A53 quad-core chip, and at present, imx8 and rk3399 under arm64 do not support smp. Is there any other board that actually runs smp under arm64 ?

qemu and fvp support arm64, you can compare the difference with your hardware. With your change, all threads but idle will pin to CPU0, which may the reason why you pass the test.

You're right. It is not right for all the results debug to be CPU0 at the end of smp call test in previous picture.

It's my problem, because the implementation of arm64_gic_send_sgi function does not match our chip, because we need to use Group 1.

sgi

I found this problem when I used smp call test yesterday, because core 0 failed to wake up core 1, after changing to Group 1(ICC_SGI1R), core 1 successfully woke up.

The g_idletcb[i].cmn.affinity = (cpu_set_t)(CONFIG_SMP_DEFAULT_CPUSET & SCHED_ALL_CPUS); in nx_start function is no problem, it's been restored to the original code.

The smp call test and coremark is ok.

call coremark

Thank you for your help! @acassis @xiaoxiang781216

acassis commented 7 months ago

@hnwxd I'm happy you should it! Please consider submitting your work on imx8 and rk3399, there are many people willing to use it with NuttX ;-)

github-xiaodong commented 7 months ago

@hnwxd I'm happy you should it! Please consider submitting your work on imx8 and rk3399, there are many people willing to use it with NuttX ;-)

I just changed the arm64_gic_send_sgi function. The original implementation is,

  ARM64_DSB();

  regval = getreg32(IGROUPR(base, 0));

  if (regval & BIT(sgi_id))
    {
      write_sysreg(sgi_val, ICC_SGI1R);     /* Group 1 */
    }
  else
    {
      write_sysreg(sgi_val, ICC_SGI0R_EL1); /* Group 0 */
    }

  ARM64_ISB();

I delete this judgment and use write_sysreg(sgi_val, ICC_SGI1R) directly.

  ARM64_DSB();

  write_sysreg(sgi_val, ICC_SGI0R_EL1); /* Group 0 */

  ARM64_ISB();

Just like the implementation in the arm_gic_send_sgi function.

  ARM_DSB();
  CP15_SET64(ICC_SGI1R, sgi_val);
  ARM_ISB();

Why is the arm_gic_send_sgi function written ICC_SGI1R directly, but the arm64_gic_send_sgi function needs to make this judgment ? I'm trying to find out why.

github-xiaodong commented 7 months ago

The pr "Add support for FIQ interrupts" made this change.

https://github.com/apache/nuttx/pull/10888