f9micro / f9-kernel

An efficient and secure microkernel built for ARM Cortex-M cores, inspired by L4
Other
678 stars 145 forks source link

Abnormal statistics when FPU support is enabled #45

Open jserv opened 10 years ago

jserv commented 10 years ago

After commit 27b9fb2d41905266a85d4d0776862cc1816eed81, F9 microkernel has FPU support now. However, it brings a side effect of abnormal statistics as the following:

-------TOP------
 4209 [ schedule_select          ]
 1548 [ softirq_execute          ]
 1544 [ svc_handler              ]
 1113 [ thread_current           ]
  867 [ thread_isrunnable        ]
  440 [ kernel_thread            ]
  436 [ __svc_handler            ]
   72 [ L4_Ipc                   ]

It is evident that symbol L4_Ipc should not run out the ranking.

georgekang commented 10 years ago

It could be solved by following patch. However, I still don't have good explanation about this.

diff --git a/include/platform/irq.h b/include/platform/irq.h
index 792f36b..53e7f4c 100644
--- a/include/platform/irq.h
+++ b/include/platform/irq.h
@@ -171,7 +171,8 @@ static inline int irq_number(void)
        {                                                               \
                irq_enter();                                            \
                sub();                                                  \
-               request_schedule();                                     \
+               if(NO_PREEMPTED_IRQ)                            \
+                       request_schedule();                             \
                irq_return();                                           \
        }
jserv commented 10 years ago

This implies that the above change reverts all PendSV utilization introduced by @arcbbb

georgekang commented 10 years ago

I think it might be a timing issue.
And it also might mean the cost of context switch is heavy.

The following is my sampling result.

 7034 [ no_fp                    ]
 1010 [ schedule_select          ]
  621 [ softirq_execute          ]
  504 [ L4_Ipc                   ]
  386 [ syscall_handler          ]
...
----------------

According to the sampling result of my board, the address with the most sampling number is 0x80018e0. It is the return instruction of context switch. We can see when the irq is reopened, the pendsv is preempted immediately.

0800189a <no_fp>:
 800189a:   4610        mov r0, r2
 800189c:   f002 faf4   bl  8003e88 <thread_switch>
 80018a0:   682b        ldr r3, [r5, #0]
 80018a2:   695a        ldr r2, [r3, #20]
 80018a4:   4696        mov lr, r2
 80018a6:   691a        ldr r2, [r3, #16]
 80018a8:   4610        mov r0, r2
 80018aa:   699a        ldr r2, [r3, #24]
 80018ac:   4612        mov r2, r2
 80018ae:   f00e 040f   and.w   r4, lr, #15
 80018b2:   f094 0f09   teq r4, #9
 80018b6:   bf0c        ite eq
 80018b8:   f380 8808   msreq   MSP, r0
 80018bc:   f380 8809   msrne   PSP, r0
 80018c0:   f103 021c   add.w   r2, r3, #28
 80018c4:   4610        mov r0, r2
 80018c6:   e890 0ff0   ldmia.w r0, {r4, r5, r6, r7, r8, r9, sl, fp}
 80018ca:   f382 8814   msr CONTROL, r2
 80018ce:   f8d3 2080   ldr.w   r2, [r3, #128]  ; 0x80
 80018d2:   b122        cbz r2, 80018de <no_fp+0x44>
 80018d4:   f103 0340   add.w   r3, r3, #64 ; 0x40
 80018d8:   4618        mov r0, r3
 80018da:   ec90 8b10   vldmia  r0, {d8-d15}
 80018de:   b662        cpsie   i
 80018e0:   4770        bx  lr
 80018e2:   f85d eb04   ldr.w   lr, [sp], #4
 80018e6:   4770        bx  lr

I think the root cause of this issue is the same as issue #40. After patching FPU support, the cost of context switch would exceed one tick and it is preempted and sampled by Kprobe(ktimer) immediately after reopening irq.
So, to solve it, we should improve context switch performance.

georgekang commented 10 years ago

Here is a workaround solution.
However, there is one drawbacks in this patch. It would break the encapsulation of mempool.
Besides, closing irq in context switch (6f51800839880eda1be6f5e6936cce5837b02727) is still necessary.

diff --git a/include/memory.h b/include/memory.h
index 43b313d..c274e4f 100644
--- a/include/memory.h
+++ b/include/memory.h
@@ -111,7 +111,13 @@ void memory_init(void);

 memptr_t mempool_align(int mpid, memptr_t addr);
 int mempool_search(memptr_t base, size_t size);
-mempool_t *mempool_getbyid(int mpid);
+
+extern mempool_t memmap[];
+inline mempool_t *mempool_getbyid(int mpid)
+{
+       return (mpid != -1)?(memmap + mpid):NULL;
+}
+

 int map_area(as_t *src, as_t *dst, memptr_t base, size_t size,
                map_action_t action, int is_priviliged);
diff --git a/kernel/memory.c b/kernel/memory.c
index 5d826c7..74f4055 100644
--- a/kernel/memory.c
+++ b/kernel/memory.c
@@ -44,7 +44,7 @@
  * Memory map of MPU.
  * Translated into memdesc array in KIP by memory_init
  */
-static mempool_t memmap[] = {
+mempool_t memmap[] = {
        DECLARE_MEMPOOL_2("KTEXT", kernel_text,
                MP_KR | MP_KX | MP_NO_FPAGE, MPT_KERNEL_TEXT),
        DECLARE_MEMPOOL_2("UTEXT", user_text,
@@ -129,14 +129,6 @@ int mempool_search(memptr_t base, size_t size)
        return -1;
 }

-mempool_t *mempool_getbyid(int mpid)
-{
-       if (mpid == -1)
-               return NULL;
-
-       return memmap + mpid;
-}
-
 void memory_init()
 {
        int i = 0, j = 0;
arcbbb commented 10 years ago

As @georgekang mentioned, it is expensive to do dynamic probing on ktimer. To do pc-sampling, I think we can use static probe instead. And I have set up an experiment with static probe on ktimer: https://github.com/arcbbb/f9-kernel/tree/test-sampling The result seems normal.

## KDB ##
-------TOP------
 3672 [ L4_Ipc                   ]
 1373 [ kernel_thread            ]
 1224 [ softirq_execute          ]
 1069 [ __svc_handler            ]
  765 [ schedule_select          ]
  610 [ syscall_handler          ]
  304 [ thread_map_search        ]
  154 [ thread_current           ]
  153 [ __ping_thread            ]
  153 [ dbg_printf               ]
  153 [ pendsv_handler           ]
  153 [ do_ipc                   ]
  152 [ sched_slot_dispatch      ]
  152 [ sys_ipc                  ]
  152 [ ipc_read_mr              ]
    1 [ __pong_thread            ]
----------------

But currently I haven't come out a good way to calculate stack pointer flexibly, I just hardly coded it. And it needs some work to create a static probe framework like trace event in linux.