Open yushoyamaguchi opened 1 year ago
For the first question, I'm sorry that we found "mirrors.cloud.aliyuncs.com" is an internal address that only ECS from aliyun can access. You may edit Dockerfile to change the address to "mirrors.aliyun.com" in /etc/yum.repos.d/epel.repo after installing epel-aliyuncs-release. Or directly use our image(plugsched-registry.cn-hangzhou.cr.aliyuncs.com/plugsched/plugsched-sdk).
For the second question, are you trying to install the plugsched rpm on ubuntu? We have not supported deb, but may try to provide you a simulated way to install plugsched on host(not container).
First, create a working dir: /var/plugsched/$(uname -r)
Then, move necessary files into it: install -m 755 working/symbol_resolve/symbol_resolve /var/plugsched/$(uname -r)/symbol_resolve install -m 755 kernel/sched/mod/scheduler.ko /var/plugsched/$(uname -r)/scheduler.ko install -m 444 working/tainted_functions /var/plugsched/$(uname -r)/tainted_functions install -m 755 working/scheduler-installer /var/plugsched/$(uname -r)/scheduler-installer install -m 755 working/hotfix_conflict_check /var/plugsched/$(uname -r)/hotfix_conflict_check
Last, run the installer script: /var/plugsched/$(uname -r)/scheduler-installer install
I've not tested with this, so feel free to report any problem you meet.
Thank you for your replying. That means the purpose of conatainer is setting up necessary files (and copying to the host), right?
If I use centOS as the host, can I run new scheduler actually in this way?
In addition, I have one more question.
We can implement new scheduler by editing files under kernel/sched/mod/ after executing bounday analyzer(plugsched-cli init
?), right?
How to build this new scheduler as kernel module?
Source files under src/ (like main.c , sched_rebuild.c) are related?
Thank you for your replying. That means the purpose of conatainer is setting up necessary files (and copying to the host), right?
Yes
If I use centOS as the host, can I run new scheduler actually in this way?
Yes. After copying the rpm from container to host, you can use "rpm -i" to install the new scheduler directly on host.
In addition, I have one more question. We can implement new scheduler by editing files under kernel/sched/mod/ after executing bounday analyzer(
plugsched-cli init
?), right? How to build this new scheduler as kernel module? Source files under src/ (like main.c/sched_rebuild.c) are related?
main.c and sched_rebuild.c under src/ will be copied to kernel/sched/mod/ automatically. They are related.
You can refer to cmd_build() in cli.py to see how it work in detail. The key file is kernel/sched/mod/Makefile (which is copied from src/Makefile), and the kernel module will be built as kernel/sched/mod/scheduler.ko
Directly insmod scheduler.ko may fail because we still need some other works. See scheduler-installer.
Thank you very much.
When executing
plugsched-cli init $(uname -r) ./kernel ./scheduler
in the podman container, this error was generated.
I use a docker image build by myself using Dockerfile.
Also, I use /work
as working dir instead of /tmp/work
because of tmpfs capacity.
If you have some ideas about this error, please tell me.
I'm sorry that we found our anolisos:latest has updated gcc minor version recently, so gcc-python-plugin needs rebuild. We will fix this problem soon.
To workaround, please downgrade the gcc version (using yum install): gcc-8.5.0-10.1.0.3.an8 gcc-c++-8.5.0-10.1.0.3.an8 gcc-plugin-devel-8.5.0-10.1.0.3.an8 libstdc++-static-8.5.0-10.1.0.3.an8 gcc-python-plugin-0.17-1.4.an8
What's more, we found there may be something wrong with "pip3 install pyyaml".
We suggest just replacing: RUN yum install epel-aliyuncs-release -y && \ with: RUN yum install epel-release -y && \ in Dockerfile. This will use the source from mirrors.fedoraproject.org directly, and then "yum install python3-pyyaml".
As I modified two points(gcc version and epel-release), which you teach, gcc-python-plugin error is fixed. Thank you. (Modified Dockerfile is this) However, another error was generated.
ImportError: cannot import name 'CLoader'
Is this the pyyaml error that you told?
What's more, we found there may be something wrong with "pip3 install pyyaml".
What should I do?
Do not use pip3 install pyyaml
Use yum install python3-pyyaml instead
I've overlooked. Thank you.
In my environment, executing plugsched-cli init $(uname -r) ./kernel ./scheduler
in a container has not been finished in 3 hours. (Current my environment is not good.)
Is this so time-consuming process, right?
For example, which is more time-consuming than compiling the Linux kernel?
Is this so time-consuming process, right?
Yes, it is time-consuming, but... Hmm... It should be faster than compiling the whole Linux kernel, I think.
Is your terminal printing info continuously? Or is stuck at one step?
We usually work on servers with 100+ cpus, so in our environment "init" will only cost several minutes.
After "init", the "build" step is very fast.
We suggest making the "init" task to background. When the task is done, maybe you can make "./scheduler" folder a backup if you want to develop different branches.
Before Pressing Ctrl-C, CC/LD/AR is printed continuously.
By the way, I use kernel 6.46.
However, content of boundary.yaml
is same as it of kernel 5.10, because I didn't find how to modify.
Is this related to unfinished process?
For parallel processing, do we need some options?
By the way, I use kernel 6.46.
Oh... That's really new. We've not tested on this version. The boundary of 5.10 may not fit 6.4, but I think plugsched can still work when directly using 5.10 boundary. The mismatch boundary config will result in smaller scope of modifiable code (many functions will be analyzed as "outer function" so you cannot modify them). The functions that cannot be edited will be removed or commented below them ("DON'T MODIFY INLINE EXTERNAL FUNCTION") in kernel/sched/mod/.
For parallel processing, do we need some options?
No need. It will auto use all cpus.
What is the process includeing CC,LD,AR ? ~collect.py or analyze.py or extract.py or else~? It seems the most time-consuming process.
It seems collect.py what part of collect.py? I cannnot find the strings like gcc or ld in collect.py.
In addition, could you find what is this error? I think it seems no error reason written. Hardware error?
Because there are no file named mocules_prepare declared in Makefile.plugsched?
It seems collect.py what part of collect.py? I cannnot find the strings like gcc or ld in collect.py.
See src/Makefile.plugsched
collect: modules_prepare
$(MAKE) CFLAGS_KERNEL="$(GCC_PLUGIN_FLAGS)" \
CFLAGS_MODULE="$(GCC_PLUGIN_FLAGS)" $(vmlinux-dirs)
In addition, could you find what is this error? I think it seems no error reason written.
Due to parallel processing, you need to page up to find the reason.
The process of "collect" is similar to compile the Linux kernel.
Thank you for replying. I see. The process is like building linux kernel and collect.py is work during compiling the kernel. where is vmlinux-dirs declared? What is make collect differ from normal compiling of linux kernel?
The error is lower part of this screen shot. CONFIG_DEBUG_INFO_BTF is one of configuration of host?
You need yum install dwarves
where is vmlinux-dirs declared? What is make collect differ from normal compiling of linux kernel?
I've just found that, they are related questions. $(vmlinux-dirs) is in Makefile in kernel 5.10
vmlinux-dirs := $(patsubst %/,%,$(filter %/, \
$(core-y) $(core-m) $(drivers-y) $(drivers-m) \
$(libs-y) $(libs-m)))
with make $(vmlinux-dirs)
we will only compile necessary files, not the whole Linux kernel (so we will not generate BTF as well).
But since Linux 6.1, it disappeared, so our Makefile "collect" will build the whole kernel...
Thank you. I installed DWARF in docker image, and the error was fixed.
But since Linux 6.1, it disappeared, so our Makefile "collect" will build the whole kernel...
I see. Therefore, collecting is so time-consuming...
Could you find out this error?
Do I need to create the dynamic_springboard.patch myself? (Is this where I write the changes to the switch_to function?)
Oops I've tried with Linux 6.4 and see the huge difference.
These two commits totally break our works: f96eca432015 ("sched/headers: Introduce kernel/sched/build_policy.c and build multiple .c files there") 801c14195510 ("sched/headers: Introduce kernel/sched/build_utility.c and build multiple .c files there")
The files inside or outside our boundary are "mixed" into one file, and the analyze and extract step will totally be broken.
Let me find the way to solve it...
BTW, you can change $(vmlinux-dirs)
with $(build-dir)
in src/Makefile.plugsched for latest Linux to speed up the collect stage.
Thank you .
I should replace boundary.yaml
.
How do you know what to fix boundary.yaml
?
Is it difficult?
you can change
$(vmlinux-dirs)
with$(build-dir)
Thank you. I'll try.
f96eca432015 ("sched/headers: Introduce kernel/sched/build_policy.c and build multiple .c files there") 801c14195510 ("sched/headers: Introduce kernel/sched/build_utility.c and build multiple .c files there")
These two commits are since Linux 5.18. So plugsched seems not support >=5.18 (not sure about 5.11~5.17)
Sorry.
Even if I add kernel/sched/build_policy.c
and so on to boundary.yaml
, it will not work with kernel 6.26 ?
Yes. I've tried to adapt Linux 6.x but it's too hard...
I see. Thank you. I'll try with kernel 5
Please allow me to ask another question.
I think this code is for jump to original scheduler. It seems that this code is patched to /kernel/sched/mod/core.c in /tmp/work dir.
Question1 : When is this code built ? analyze? collect? extract? I think src/Makefile is related but I cannot find out.
Question2 : When is this code (/kernel/sched/mod/core.c in /tmp/work dir) executed and by who ? Is this included in kernel module? I haven't understood why old scheduler in kernel binary will be not executed after using kernel module.
P.S. https://dl.acm.org/doi/10.1145/3582016.3582054 I read this paper. Thank you.
I think this code is for jump to original scheduler. It seems that this code is patched to /kernel/sched/mod/core.c in /tmp/work dir.
Yes.
Question1 : When is this code built ? analyze? collect? extract? I think src/Makefile is related but I cannot find out.
This code is patched after "extract", and is built through "plugsched-cli build". Because "core.o" is in src/Makefile.
Question2 : When is this code (/kernel/sched/mod/core.c in /tmp/work dir) executed and by who ? Is this included in kernel module? I haven't understood why old scheduler in kernel binary will be not executed after using kernel module.
It's a bit like hotfix. We dynamically patch the function entry (i.e., first 5 bytes in x86) to "jmp our_module". So old scheduler in kernel binary will be not executed.
The reason of the patch in your picture is mainly about sleeping threads. Their RIP will stay in switch_to() and go to sleep. If we rmmod plugsched, the memory (including text code) will be freed. Then if these sleeping threads are woken up, they cannot find the text code according to their last RIP.
It's a bit like hotfix. We dynamically patch the function entry (i.e., first 5 bytes in x86) to "jmp our_module".
Please tell me the codes or scripts for this operation in plugsched.
I'm sorry for one more question. Which do you think is better to kernel 5.8 or 5.11 when using boundary.yaml of 5.10 ? (Fedora 33 adopts kernel 5.8 , 34 adopts kernel 5.11)
Please tell me the codes or scripts for this operation in plugsched.
JUMP_OPERATION() in src/head_jump.h (called from __sync_sched_install() in src/main.c)
I'm sorry for one more question. Which do you think is better to kernel 5.8 or 5.11 when using boundary.yaml of 5.10 ? (Fedora 33 adopts kernel 5.8 , 34 adopts kernel 5.11)
I'm not sure, but maybe you could try 5.8 first
Thank you very much.
When JUMP_INIT_FUNC is executed?
See jump_init_all() in sched_mod_init()
Thank you.
In kernel ver5.X, this error is occurred in extract
.
Do you know this error ?
It seems there may be sth wrong with your src.rpm?
Actually this step is to fetch kernel source code (which is not strong related to plugsched) Do you have another way to get this code? :-/
Maybe you could try rpm2cpio xxx.rpm | cpio -idmv
to extract it directly
I use
yumdownloader --source kernel-$(uname -r)
and
result of ls command after rpm2cpio xxx.rpm | cpio -idmv
is this.
I'll try with source rpm which get using WEB page.
take a look at *.spec
this file will show the kernel path
If we just want source codes, can I get only kernel code by github? Some errors will occur?
Yes, we do not care where the source code comes from. You can get it from github or git.kernel.org
Thank you.
Isn't the difference between using kernel version and extracted kernel version are allowed? (For example fedora using kernel 5.10.13-200rcXXX and extracting code of kernel 5.10 from GitHub)
They are allowed if the code not breaking the boundary. (Even if it breaks, it should only cause smaller boundary and more outside functions that you cannot modify.)
The "init" step will rely on both source code and debuginfo package (including vmlinux and .config for your target host kernel). The source code decides how the code works after installing plugsched, while debuginfo package decides how to replace old functions (with jmp xxx). Ensure your kernel-debuginfo rpm is the right version.
Thank you for developing. I'm using this repository in research activity from Japan.
In my environment in ubuntu22, when executing
This error is occurrd. Should I edit Dockerfile?
Second question, Can I use the container by this Dockerfile as host of plugsched? I imagine that it is difficult to use kernel module from a container.