Open jserv opened 10 years ago
It could be solved by following patch. However, I still don't have good explanation about this.
diff --git a/include/platform/irq.h b/include/platform/irq.h
index 792f36b..53e7f4c 100644
--- a/include/platform/irq.h
+++ b/include/platform/irq.h
@@ -171,7 +171,8 @@ static inline int irq_number(void)
{ \
irq_enter(); \
sub(); \
- request_schedule(); \
+ if(NO_PREEMPTED_IRQ) \
+ request_schedule(); \
irq_return(); \
}
This implies that the above change reverts all PendSV utilization introduced by @arcbbb
I think it might be a timing issue.
And it also might mean the cost of context
switch is heavy.
The following is my sampling result.
7034 [ no_fp ]
1010 [ schedule_select ]
621 [ softirq_execute ]
504 [ L4_Ipc ]
386 [ syscall_handler ]
...
----------------
According to the sampling result of my board, the address with the most sampling number is 0x80018e0. It is the return instruction of context switch. We can see when the irq is reopened, the pendsv is preempted immediately.
0800189a <no_fp>:
800189a: 4610 mov r0, r2
800189c: f002 faf4 bl 8003e88 <thread_switch>
80018a0: 682b ldr r3, [r5, #0]
80018a2: 695a ldr r2, [r3, #20]
80018a4: 4696 mov lr, r2
80018a6: 691a ldr r2, [r3, #16]
80018a8: 4610 mov r0, r2
80018aa: 699a ldr r2, [r3, #24]
80018ac: 4612 mov r2, r2
80018ae: f00e 040f and.w r4, lr, #15
80018b2: f094 0f09 teq r4, #9
80018b6: bf0c ite eq
80018b8: f380 8808 msreq MSP, r0
80018bc: f380 8809 msrne PSP, r0
80018c0: f103 021c add.w r2, r3, #28
80018c4: 4610 mov r0, r2
80018c6: e890 0ff0 ldmia.w r0, {r4, r5, r6, r7, r8, r9, sl, fp}
80018ca: f382 8814 msr CONTROL, r2
80018ce: f8d3 2080 ldr.w r2, [r3, #128] ; 0x80
80018d2: b122 cbz r2, 80018de <no_fp+0x44>
80018d4: f103 0340 add.w r3, r3, #64 ; 0x40
80018d8: 4618 mov r0, r3
80018da: ec90 8b10 vldmia r0, {d8-d15}
80018de: b662 cpsie i
80018e0: 4770 bx lr
80018e2: f85d eb04 ldr.w lr, [sp], #4
80018e6: 4770 bx lr
I think the root cause of this issue is the same as
issue #40. After patching FPU support, the cost of context switch would exceed one
tick and it is preempted and sampled by Kprobe(ktimer) immediately after reopening irq.
So, to solve it, we should improve context switch performance.
Here is a workaround solution.
However, there is one drawbacks in this patch. It would break the encapsulation of mempool.
Besides, closing irq in context switch (6f51800839880eda1be6f5e6936cce5837b02727) is still necessary.
diff --git a/include/memory.h b/include/memory.h
index 43b313d..c274e4f 100644
--- a/include/memory.h
+++ b/include/memory.h
@@ -111,7 +111,13 @@ void memory_init(void);
memptr_t mempool_align(int mpid, memptr_t addr);
int mempool_search(memptr_t base, size_t size);
-mempool_t *mempool_getbyid(int mpid);
+
+extern mempool_t memmap[];
+inline mempool_t *mempool_getbyid(int mpid)
+{
+ return (mpid != -1)?(memmap + mpid):NULL;
+}
+
int map_area(as_t *src, as_t *dst, memptr_t base, size_t size,
map_action_t action, int is_priviliged);
diff --git a/kernel/memory.c b/kernel/memory.c
index 5d826c7..74f4055 100644
--- a/kernel/memory.c
+++ b/kernel/memory.c
@@ -44,7 +44,7 @@
* Memory map of MPU.
* Translated into memdesc array in KIP by memory_init
*/
-static mempool_t memmap[] = {
+mempool_t memmap[] = {
DECLARE_MEMPOOL_2("KTEXT", kernel_text,
MP_KR | MP_KX | MP_NO_FPAGE, MPT_KERNEL_TEXT),
DECLARE_MEMPOOL_2("UTEXT", user_text,
@@ -129,14 +129,6 @@ int mempool_search(memptr_t base, size_t size)
return -1;
}
-mempool_t *mempool_getbyid(int mpid)
-{
- if (mpid == -1)
- return NULL;
-
- return memmap + mpid;
-}
-
void memory_init()
{
int i = 0, j = 0;
As @georgekang mentioned, it is expensive to do dynamic probing on ktimer. To do pc-sampling, I think we can use static probe instead. And I have set up an experiment with static probe on ktimer: https://github.com/arcbbb/f9-kernel/tree/test-sampling The result seems normal.
## KDB ##
-------TOP------
3672 [ L4_Ipc ]
1373 [ kernel_thread ]
1224 [ softirq_execute ]
1069 [ __svc_handler ]
765 [ schedule_select ]
610 [ syscall_handler ]
304 [ thread_map_search ]
154 [ thread_current ]
153 [ __ping_thread ]
153 [ dbg_printf ]
153 [ pendsv_handler ]
153 [ do_ipc ]
152 [ sched_slot_dispatch ]
152 [ sys_ipc ]
152 [ ipc_read_mr ]
1 [ __pong_thread ]
----------------
But currently I haven't come out a good way to calculate stack pointer flexibly, I just hardly coded it. And it needs some work to create a static probe framework like trace event in linux.
After commit 27b9fb2d41905266a85d4d0776862cc1816eed81, F9 microkernel has FPU support now. However, it brings a side effect of abnormal statistics as the following:
It is evident that symbol
L4_Ipc
should not run out the ranking.