chenpengcong / blog

14 stars 3 forks source link

深入理解系统调用 #6

Open chenpengcong opened 6 years ago

chenpengcong commented 6 years ago

网上大多数文章都是基于linux2.x版本的i386架构代码进行分析,而目前linux已经更新到4.x了,所以该文章基于linux4.14.3 stable version源码进行分析,源码下载地址

这里我关注的是X86架构,其他处理器架构的代码原理大同小异

从系统调用表开始进行分析,系统调用表在/arch/x86/entry/syscalls/syscall_32.tbl文件

截取该文件部分内容如下

# 32-bit system call numbers and entry vectors
#
# The format is:
# <number> <abi> <name> <entry point> <compat entry point>
#
# The abi is always "i386" for this file.
#
0    i386    restart_syscall        sys_restart_syscall
1    i386    exit            sys_exit
2    i386    fork            sys_fork            sys_fork
3    i386    read            sys_read
4    i386    write            sys_write
5    i386    open            sys_open            compat_sys_open

该表定义了系统调用号,系统调用名称和入口名称

那么该表什么时候用到了,在/arch/x86目录下中搜索syscall_32.tbl,发现在/arch/x86/entry/syscalls/Makefile中使用到该文件

文件内容如下

# SPDX-License-Identifier: GPL-2.0
out := $(obj)/../../include/generated/asm
uapi := $(obj)/../../include/generated/uapi/asm

# Create output directory if not already present
_dummy := $(shell [ -d '$(out)' ] || mkdir -p '$(out)') \
      $(shell [ -d '$(uapi)' ] || mkdir -p '$(uapi)')

syscall32 := $(srctree)/$(src)/syscall_32.tbl
syscall64 := $(srctree)/$(src)/syscall_64.tbl

syshdr := $(srctree)/$(src)/syscallhdr.sh
systbl := $(srctree)/$(src)/syscalltbl.sh

quiet_cmd_syshdr = SYSHDR  $@
      cmd_syshdr = $(CONFIG_SHELL) '$(syshdr)' '$<' '$@' \
          '$(syshdr_abi_$(basetarget))' \
          '$(syshdr_pfx_$(basetarget))' \
          '$(syshdr_offset_$(basetarget))'
quiet_cmd_systbl = SYSTBL  $@
      cmd_systbl = $(CONFIG_SHELL) '$(systbl)' $< $@

quiet_cmd_hypercalls = HYPERCALLS $@
      cmd_hypercalls = $(CONFIG_SHELL) '$<' $@ $(filter-out $<,$^)

syshdr_abi_unistd_32 := i386
$(uapi)/unistd_32.h: $(syscall32) $(syshdr)
    $(call if_changed,syshdr)

syshdr_abi_unistd_32_ia32 := i386
syshdr_pfx_unistd_32_ia32 := ia32_
$(out)/unistd_32_ia32.h: $(syscall32) $(syshdr)
    $(call if_changed,syshdr)

syshdr_abi_unistd_x32 := common,x32
syshdr_offset_unistd_x32 := __X32_SYSCALL_BIT
$(uapi)/unistd_x32.h: $(syscall64) $(syshdr)
    $(call if_changed,syshdr)

syshdr_abi_unistd_64 := common,64
$(uapi)/unistd_64.h: $(syscall64) $(syshdr)
    $(call if_changed,syshdr)

syshdr_abi_unistd_64_x32 := x32
syshdr_pfx_unistd_64_x32 := x32_
$(out)/unistd_64_x32.h: $(syscall64) $(syshdr)
    $(call if_changed,syshdr)

$(out)/syscalls_32.h: $(syscall32) $(systbl)
    $(call if_changed,systbl)
$(out)/syscalls_64.h: $(syscall64) $(systbl)
    $(call if_changed,systbl)

$(out)/xen-hypercalls.h: $(srctree)/scripts/xen-hypercalls.sh
    $(call if_changed,hypercalls)

$(out)/xen-hypercalls.h: $(srctree)/include/xen/interface/xen*.h

uapisyshdr-y            += unistd_32.h unistd_64.h unistd_x32.h
syshdr-y            += syscalls_32.h
syshdr-$(CONFIG_X86_64)        += unistd_32_ia32.h unistd_64_x32.h
syshdr-$(CONFIG_X86_64)        += syscalls_64.h
syshdr-$(CONFIG_XEN)        += xen-hypercalls.h

targets    += $(uapisyshdr-y) $(syshdr-y)

PHONY += all
all: $(addprefix $(uapi)/,$(uapisyshdr-y))
all: $(addprefix $(out)/,$(syshdr-y))
    @:

根据Makefile内容可以看出make会根据syscall_32.tbl文件生成相应的头文件syscalls_32.h等

那么具体会生成哪些文件呢?生成的文件内容又是什么呢?

动手来试一下,我在自己的虚拟机下(Ubuntu17.10 64bit)编译这份源码

在源码根目录下Make,结果如下

  HOSTCC  scripts/basic/fixdep
  HOSTCC  scripts/kconfig/conf.o
  SHIPPED scripts/kconfig/zconf.tab.c
  SHIPPED scripts/kconfig/zconf.lex.c
  HOSTCC  scripts/kconfig/zconf.tab.o
  HOSTLD  scripts/kconfig/conf
scripts/kconfig/conf  --silentoldconfig Kconfig
***
*** Configuration file ".config" not found!
***
*** Please run some configurator (e.g. "make oldconfig" or
*** "make menuconfig" or "make xconfig").
***
scripts/kconfig/Makefile:38: recipe for target 'silentoldconfig' failed
make[2]: *** [silentoldconfig] Error 1
Makefile:548: recipe for target 'silentoldconfig' failed
make[1]: *** [silentoldconfig] Error 2
  SYSTBL  arch/x86/entry/syscalls/../../include/generated/asm/syscalls_32.h
  SYSHDR  arch/x86/entry/syscalls/../../include/generated/uapi/asm/unistd_32.h
  SYSHDR  arch/x86/entry/syscalls/../../include/generated/uapi/asm/unistd_64.h
  SYSHDR  arch/x86/entry/syscalls/../../include/generated/uapi/asm/unistd_x32.h
  HOSTCC  arch/x86/tools/relocs_32.o
  HOSTCC  arch/x86/tools/relocs_64.o
  HOSTCC  arch/x86/tools/relocs_common.o
  HOSTLD  arch/x86/tools/relocs
make: *** No rule to make target 'include/config/auto.conf', needed by 'include/config/kernel.release'.  Stop.

这里报错的原因编译内核源码需要先配置选项,但这不是我们关注的重点,可以看到虽然报错中断了,但是相应的头文件已经生成了,目的已经达到了

其实这里可能由于没有编译成功少生成了某些文件,因为我在虚拟机/usr/src/linux-headers-4.13.0-16-generic/这个路径下发现还多了syscalls_64.h,unistd_64_x32.h等文件,linux-headers-4.13.0-16-generic是系统本身自带的,但其实不影响接下来的分析,因为syscalls_64.h和syscalls_32.h差别只在于命名时候将64改为I386

最终生成了以下几个文件

unistd_64.h部分内容如下

#define __NR_read 0
#define __NR_write 1
#define __NR_open 2
#define __NR_close 3
#define __NR_stat 4
#define __NR_fstat 5

unistd_32.h部分内容如下

#define __NR_restart_syscall 0
#define __NR_exit 1
#define __NR_fork 2
#define __NR_read 3
#define __NR_write 4
#define __NR_open 5

syscalls_32.h部分内容如下

__SYSCALL_I386(0, sys_restart_syscall, )
__SYSCALL_I386(1, sys_exit, )
#ifdef CONFIG_X86_32
__SYSCALL_I386(2, sys_fork, )
#else
__SYSCALL_I386(2, sys_fork, )
#endif
__SYSCALL_I386(3, sys_read, )
__SYSCALL_I386(4, sys_write, )

unistd_64.h和unistd_32.h文件定义了系统调用号,用来关联相应的系统调用,而且从unistd_64.h和unistd_32.h文件可以看出64bit和32bit系统的相同系统调用号对应的系统调用却是不同的

这里syscalls_32.h文件的内容看起来像是一些宏,具体表示什么呢?

在x86目录下全局搜索__SYSCALL_I386,发现被以下文件引用

个人感觉user-offsets.casm-offsets_64.c可能不是那么重要,不影响分析,先记下,有空再研究

syscall_32.csys_call_table_32.c文件内容是类似的

且sys_call_table_32.c文件开头的注释引起了我的注意

/*
* System call table for UML/i386, copied from arch/x86/kernel/syscall_*.c
* with some changes for UML.
*/

大概意思就是该文件内容是从syscall_32.c复制过来并专门为UML/i386做了一些修改

那么UML是什么意思,经查资料,UML全称用户态Linux(User-mode Linux),将Linux编译为user mode使得该内核可以跑在另外一个操作系统上,详见维基

既然是针对UM的,那我们分析syscall_32.c即可

syscall_32.c内容

// SPDX-License-Identifier: GPL-2.0
/* System call table for i386. */

#include <linux/linkage.h>
#include <linux/sys.h>
#include <linux/cache.h>
#include <asm/asm-offsets.h>
#include <asm/syscall.h>

#define __SYSCALL_I386(nr, sym, qual) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ;
#include <asm/syscalls_32.h>
#undef __SYSCALL_I386

#define __SYSCALL_I386(nr, sym, qual) [nr] = sym,

extern asmlinkage long sys_ni_syscall(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);

__visible const sys_call_ptr_t ia32_sys_call_table[__NR_syscall_compat_max+1] = {
    /*
    * Smells like a compiler bug -- it doesn't work
    * when the & below is removed.
    */
    [0 ... __NR_syscall_compat_max] = &sys_ni_syscall,
#include <asm/syscalls_32.h>
};

可以看到__SYSCALL_I386(nr, sym, qual)是个宏,且被定义了2次,用来分别解释unistd_32.h文件中的内容

一开始宏__SYSCALL_I386(nr, sym, qual)展开后是个函数声明,以unistd_32.h文件中的__SYSCALL_I386(1, sys_exit, )为例,将被展开为 extern asmlinkage long sys_exit(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ;

接下来宏__SYSCALL_I386(nr, sym, qual)被展开为[nr] = sym,,以unistd_32.h文件中的__SYSCALL_I386(1, sys_exit, )为例,将被展开为 [1] = sys_exit,

然后定义了一个名为ia32_sys_call_table的数组,下标为系统调用号,元素为相应系统调用函数指针

这里ia32_sys_call_table其实是一个宏,指向一个定义在/arch/x86/include/asm/syscall.h中的数组,类型sys_call_ptr_t是一个函数指针,也定义在/arch/x86/include/asm/syscall.h

接下来查看ia32_sys_call_table在哪被调用,经查找,在/arch/x86/entry/common.c中被do_syscall_32_irqs_on调用

static __always_inline void do_syscall_32_irqs_on(struct pt_regs *regs)
{
    struct thread_info *ti = current_thread_info();
    unsigned int nr = (unsigned int)regs->orig_ax;

#ifdef CONFIG_IA32_EMULATION
    current->thread.status |= TS_COMPAT;
#endif

    if (READ_ONCE(ti->flags) & _TIF_WORK_SYSCALL_ENTRY) {
        /*
        * Subtlety here: if ptrace pokes something larger than
        * 2^32-1 into orig_ax, this truncates it.  This may or
        * may not be necessary, but it matches the old asm
        * behavior.
        */
        nr = syscall_trace_enter(regs);
    }

    if (likely(nr < IA32_NR_syscalls)) {
        /*
        * It's possible that a 32-bit syscall implementation
        * takes a 64-bit parameter but nonetheless assumes that
        * the high bits are zero.  Make sure we zero-extend all
        * of the args.
        */
    regs->ax = ia32_sys_call_table[nr](
            (unsigned int)regs->bx, (unsigned int)regs->cx,
            (unsigned int)regs->dx, (unsigned int)regs->si,
            (unsigned int)regs->di, (unsigned int)regs->bp);
    }

    syscall_return_slowpath(regs);
}

而do_syscall_32_irqs_on被同文件的定义的do_int80_syscall_32函数调用

/* Handles int $0x80 */
__visible void do_int80_syscall_32(struct pt_regs *regs)
{
    enter_from_user_mode();
    local_irq_enable();
    do_syscall_32_irqs_on(regs);
}

do_int80_syscall_32再被/arch/x86/entry/entry_32.S文件调用

ENTRY(entry_INT80_32)
    ASM_CLAC
    pushl    %eax            /* pt_regs->orig_ax */
    SAVE_ALL pt_regs_ax=$-ENOSYS    /* save rest */

    /*
    * User mode is traced as though IRQs are on, and the interrupt gate
    * turned them off.
    */
    TRACE_IRQS_OFF

    movl    %esp, %eax
    call    do_int80_syscall_32
.Lsyscall_32_done:

而ENTRY(entry_INT80_32)就是触发0x80 号中断后的入口

到这里linux内核的系统调用流程就差不多了

那么平时我们使用的glibc库是如何调用系统调用的呢?

github上拉取glibc源码进行分析

首先,我们需要看系统调用在glibc的具体实现,具体实现源码是由 /sysdeps/unix/make-syscalls.sh脚本生成的,make-syscalls.sh脚本会读取syscalls.list文件为每个系统调用生成如下的一些宏

SYSCALL_NAME        syscall name
SYSCALL_NARGS        number of arguments this call takes
SYSCALL_SYMBOL        primary symbol name
SYSCALL_CANCELLABLE    1 if the call is a cancelation point
SYSCALL_NOERRNO        1 to define a no-errno version (see below)
SYSCALL_ERRVAL        1 to define an error-value version (see below)

然后将syscall-template.S文件内容#include进来生成相应系统调用代码文件。

看syscall-template.S文件中执行系统调用代码片段

/* This is a "normal" system call stub: if there is an error,
  it returns -1 and sets errno.  */

T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
    ret
T_PSEUDO_END (SYSCALL_SYMBOL)

宏T_PSEUDO 被定义展开为另一个宏PSEUDO

宏PSEUDO定义在/sysdeps/i386/sysdep.h

#define    PSEUDO(name, syscall_name, args)                      \
  .globl syscall_error;                                  \
lose: SYSCALL_PIC_SETUP                                  \
  jmp JUMPTARGET(syscall_error);                          \
  ENTRY (name)                                      \
  DO_CALL (syscall_name, args);                              \
  jb lose

宏ENTRY定义在/sysdeps/i386/sysdep.h

/* Define an entry point visible from C.

  There is currently a bug in gdb which prevents us from specifying
  incomplete stabs information.  Fake some entries here which specify
  the current source file.  */
#define    ENTRY(name)                                  \
  STABS_CURRENT_FILE1("")                              \
  STABS_CURRENT_FILE(name)                              \
  ASM_GLOBAL_DIRECTIVE C_SYMBOL_NAME(name);                      \
  ASM_TYPE_DIRECTIVE (C_SYMBOL_NAME(name),@function)                  \
  .align ALIGNARG(4);                                  \
  STABS_FUN(name)                                  \
  C_LABEL(name)                                      \
  cfi_startproc;                                  \
  CALL_MCOUNT

宏DO_CALL定义在/sysdeps/unix/sysv/linux/i386/sysdep.h

# define ENTER_KERNEL int $0x80

#define DO_CALL(syscall_name, args)                                \
    PUSHARGS_##args                                  \
    DOARGS_##args                                  \
    movl $SYS_ify (syscall_name), %eax;                          \
    ENTER_KERNEL                                  \
    POPARGS_##args

可以看到DO_CALL触发0x80 号中断

宏SYS_ify定义在/sysdeps/unix/sysv/linux/i386/sysdep.h

/* For Linux we can use the system call table in the header file
    /usr/include/asm/unistd.h
  of the kernel.  But these symbols do not follow the SYS_* syntax
  so we have to redefine the `SYS_ify' macro here.  */
#undef SYS_ify
#define SYS_ify(syscall_name)    __NR_##syscall_name

可以看到_NR##syscall_name其实就是分析linux源码时候unistd_32.h文件所定义的系统调用号

根据代码理解如下: ENTRY(name)宏使得系统调用名称在C层是可见的,然后该系统调用具体实现就是DO_CALL (syscall_name, args);

参考: http://docs.huihoo.com/joyfire.net/6-1.html http://blog.csdn.net/gatieme/article/details/50779184 https://zhuanlan.zhihu.com/p/28984642 https://bbs.pku.edu.cn/v2/post-read.php?bid=13&threadid=16042150&page=a&postid=16206732

DawnGuoDev commented 4 years ago

大佬写得好,点个赞。