aliyun / plugsched

Live upgrade Linux kernel scheduler subsystem
BSD 3-Clause "New" or "Revised" License
82 stars 23 forks source link

src: fix the bug that tasks enqueue between clear_sched_state() and rebuild_sched_state() #199

Closed zxpdemonio closed 1 year ago

zxpdemonio commented 1 year ago

sched_free_extrapad() is called between clear_sched_state() and rebuild_sched_state() in function __sync_sched_restore(). In function sched_free_extrapad(), free_percpu() may be called, which will result in queue_work_on() called, and a kwork will enqueue, between clear_sched_state() and rebuild_sched_state(), and it will enqueue again in function clear_sched_state(). In this case, cfs->h_nr_running will be wrong.

To fix this bug, we call sched_free_extrapad() later, after rebuild_sched_state().

Reported-by: Cruz Zhao CruzZhao@linux.alibaba.com

CLAassistant commented 1 year ago

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Cruz Zhao seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

anolis-bot commented 1 year ago

@zxpdemonio , a new test job has been submitted. Please wait in patience. The test job url: https://tone.openanolis.cn/ws/2r1b4c7z/test_result/72128

ampresent commented 1 year ago

Does sched_alloc_extrapad have the same problem? Even if not, maybe it's a good idea to move it to the bottom of __sync_sched_install. Both to keep behavior consistent, and avoid unnecessary confusion.

anolis-bot commented 1 year ago

@zxpdemonio , The CI test is completed, please check result:

Test CaseTest Result
schedule_testx86_64:x: FAIL
schedule_testaarch64:white_check_mark: SUCCESS
public_var_test:white_check_mark: SUCCESS
var_uniformity_test:white_check_mark: SUCCESS
cpu_throttle_test:white_check_mark: SUCCESS
domain_rebuild_test:white_check_mark: SUCCESS
sched_syscall_test:white_check_mark: SUCCESS
mem_pressure_test:white_check_mark: SUCCESS

Sorry, your test job failed. Please get the details in the link.

zxpdemonio commented 1 year ago

Does sched_alloc_extrapad have the same problem? Even if not, maybe it's a good idea to move it to the bottom of __sync_sched_install. Both to keep behavior consistent, and avoid unnecessary confusion.

Yeah, it does. In funciont simple_percpu_mempool_create(), if percpu_ptr failed to allocate, free_percpu() will be called. But sched_alloc_extrapad cannot be moved to the bottom of __sync_sched_install(), as the memory may be used in rebuild_sched_state(). We should move it to the top.

anolis-bot commented 1 year ago

@zxpdemonio , the code has been updated, so a new test job has been submitted. Please wait in patience. The test job url: https://tone.openanolis.cn/ws/2r1b4c7z/test_result/72149

anolis-bot commented 1 year ago

@zxpdemonio , The CI test is completed, please check result:

Test CaseTest Result
schedule_testx86_64:white_check_mark: SUCCESS
public_var_test:white_check_mark: SUCCESS
var_uniformity_test:white_check_mark: SUCCESS
cpu_throttle_test:white_check_mark: SUCCESS
domain_rebuild_test:white_check_mark: SUCCESS
sched_syscall_test:white_check_mark: SUCCESS
mem_pressure_test:white_check_mark: SUCCESS
schedule_testaarch64:white_check_mark: SUCCESS
public_var_test:white_check_mark: SUCCESS
var_uniformity_test:white_check_mark: SUCCESS
cpu_throttle_test:white_check_mark: SUCCESS
domain_rebuild_test:white_check_mark: SUCCESS
sched_syscall_test:white_check_mark: SUCCESS
mem_pressure_test:white_check_mark: SUCCESS

Congratulations, your test job passed!