Pin-Jiun / MIT-6.S081-Operating-System-Engineering

0 stars 0 forks source link

2.2-Starting xv6 with GDB, Endian #5

Open Pin-Jiun opened 2 years ago

Pin-Jiun commented 2 years ago

Starting xv6 with GDB

開啟兩個Treminal,第一個進入xv6資料夾輸入

make CPUS=1 qemu-gdb

此時會有提示說*** Now run 'gdb' in another window.,故以下指令開始在第二個Treminal中執行

修改自己 home 目錄下的 .gdbinit 文件,允许 gdb 在xv6-labs-2022这个目錄啟動的時候,加載 .gdbinit 文件

echo "add-auto-load-safe-path /home/jim_liu/xv6-labs-2022/.gdbinit " >> ~/.gdbinit

進入xv6資料夾,執行gdb-multiarch <<MIT課堂教授輸入的是riscv64-linux-gnu-gdb ???

image

接著在gdb模式下執行layout split ,以利我們trace code

設立break point 1.b _entry 2.b start 3.b main 4.b userinit 5.b scheduler 6.b usertrap 7.b syscall 8.b exec

image

使用ni指令細看整個OS的運作流程

搭配指令1.i r 2. i r t0(register) 3.ni or n 4.si or s

Pin-Jiun commented 2 years ago

開始

ROM有寫好的相關執行code image

auipc $t0=0x1000+0x0=0x1000=4096

addi $a1=4096+32=4128

csrr 將csr register(mhartid)的值讀入$a0 $mhartid=hardware thread ID = 0x0 $a0=$mhartid=0x0

ld t0 24(t0) 將t0=0x1000加上24byte的offset後=0x1018

使用x 0x1018查找此記憶體,發現值為0x80000000

因為Little-Endian 上面gdb顯示的0x101a才會顯示0X8000

unimp unimportant 的指令 全為0的指令

jr t0 跳轉至$t0儲存的值0x80000000,並將PC(0X1010)+4後儲存至$ra 由於編譯時kernel.ld檔有設定好0x80000000為_entry,開始進入 entry.S

kernel.ld部分如下

OUTPUT_ARCH( "riscv" )
ENTRY( _entry )

SECTIONS
{
  /*
   * ensure that entry.S / _entry is at 0x80000000,
   * where qemu's -kernel jumps.
   */
  . = 0x80000000;
Pin-Jiun commented 2 years ago

Little-Endian vs Big-Endian

假設我們有一個 CPU讀取到的 32 位元(bits)整數資料為 0x12345678 實際讀取記憶體的方式如下

Big-Endian

image

憑感覺最直觀的方式,會將記憶體依序讀取並fetch至CPU(如圖將其攤平),讀取到最後面(LSB)為較 的記憶體位置 將記憶體以遞增(越來越大big)的方式排列,攤平後讀取

Little-Endian

image 雖然最不直觀,但大多CPU存取的方式多為此

會將記憶體以相反的順序讀取並fetch至CPU,讀取到最後面(LSB)為較 的記憶體位置 將記憶體以遞減(越來越小little)的方式排列,攤平後讀取 你所指定的address永遠會存在在LSB , 這樣是有很多好處的


The advantage of Little-Endian

the address of a given value in memory, taken as a 32, 16, or 8 bit width, is the same.

In other words, if you have in memory a two byte value:

0x00f0   16
0x00f1    0

taking that '16' as a 16-bit value (c 'short' on most 32-bit systems) or as an 8-bit value (generally c 'char') changes only the fetch instruction you use — not the address you fetch from. The addresses both are0x00f0

On a big-endian system, with the above layed out as:

0x00f0    0
0x00f1   16

taking that '16' as a 16-bit value (c 'short' on most 32-bit systems) , the address is 0x00f0 taking that '16' as a 8-bit value(generally c 'char') , the address is 0x00f1 NOT 0x00f0


GDB顯示探討

先了解MEM的實際狀況 <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

Address | store bit -- | -- 0X1016 | 0x00 0X1017 | 0x00 0X1018 | 0x00 0X1019 | 0x00 0X101a | 0x00 0X101b | 0x80

由於一開始的機器語言是以32bit來code,依序往下讀取32bit(4B) 當讀取到unimp後依序讀取2B確認,故螢幕上如果非可以解碼的組合語言顯示的是2B

0X1018 GDB顯示 unimp 也就是0x0000 GDB讀取的是0X1018和0X1019,並且讀取方式是 Little-Endian <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

store bit | 0x00 | 0x00 -- | -- | -- Address | 0X1019 | 0X1018

0X101a GDB顯示 0x8000 GDB讀取的是0X101a和0X101b,並且讀取方式是 Little-Endian <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

store bit | 0x80 | 0x00 -- | -- | -- Address | 0X101b | 0X101a

使用指令 x 0x1018讀取共32bit 會顯示結果為0x800000000x1018開始讀取32bit故會讀取到0X101b,讀取方式是 Little-Endian

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

store bit | 0x80 | 0x00 | 0x00 | 0x00 -- | -- | -- | -- | -- Address | 0X101b | 0X101a | 0X1019 | 0X1018

參考資料 https://blog.gtwang.org/programming/difference-between-big-endian-and-little-endian-implementation-in-c/ https://softwareengineering.stackexchange.com/questions/95556/what-is-the-advantage-of-little-endian-format

Pin-Jiun commented 2 years ago

進入entry.S,開始製作stack

由於所有的C語言再呼叫另外一個function時會以stack的方式堆疊,故要先建立stack

        # qemu -kernel loads the kernel at 0x80000000
        # and causes each hart (i.e. CPU) to jump there.
        # kernel.ld causes the following code to
        # be placed at 0x80000000.
.section .text
.global _entry
_entry:
        # set up a stack for C.
        # stack0 is declared in start.c,
        # with a 4096-byte stack per CPU.
        # sp = stack0 + (hartid * 4096)
        la sp, stack0
        li a0, 1024*4
        csrr a1, mhartid
        addi a1, a1, 1
        mul a0, a0, a1
        add sp, sp, a0
        # jump to start() in start.c
        call start
spin:
        j spin

image

lui (Load upper immediate) lui rd, uimm20 將 unsigned 20-bit放到 rd暫存器的 31-12 bits,並將最低的 12-bit補 0。

AUIPC(add upper immediate to pc) auipc rd, uimm20 unsigned 20-bit放到最高 20位元,剩餘 12位元補0,將此 32-bit數值 sign-extension成 64-bit,與 pc相加寫入 rd暫存器。

jal跳至kernel/start.c:21

Pin-Jiun commented 2 years ago

Function Start

void start() 執行一些configuration that is only allowed in machine mode

#include "types.h"
#include "param.h"
#include "memlayout.h"
#include "riscv.h"
#include "defs.h"

void main();
void timerinit();

// entry.S needs one stack per CPU.
__attribute__ ((aligned (16))) char stack0[4096 * NCPU];

// a scratch area per CPU for machine-mode timer interrupts.
uint64 timer_scratch[NCPU][5];

// assembly code in kernelvec.S for machine-mode timer interrupt.
extern void timervec();

// entry.S jumps here in machine mode on stack0.
void
start()
{
  // set M Previous Privilege mode to Supervisor, for mret.
  unsigned long x = r_mstatus();
  x &= ~MSTATUS_MPP_MASK;
  x |= MSTATUS_MPP_S;
  w_mstatus(x);

  // set M Exception Program Counter to main, for mret.
  // requires gcc -mcmodel=medany
  w_mepc((uint64)main);

  // disable paging for now.
將 satp 暫存器設置為0,satp 暫存器表示是否需要 pagging,0表示禁用。
  w_satp(0);

  // delegate all interrupts and exceptions to supervisor mode.
將 medeleg 和 mideleg 全部寫入成1,表示當 interrupts (中斷) 或是 exceptions (例外) 發生時,不會在 machine mode 底下進行處理,而是由 supervisor mode 進行處理。
  w_medeleg(0xffff);
  w_mideleg(0xffff);

讀取 sie 暫存器,用於決定是否 interrupt (中斷),對 sie 暫存器寫入一些值,並且將它存放回 sie 暫存器中,而這一些值表示允許在 supervisor mode 底下,I/O 裝置 (外部裝置 externel) interrupt,timer interrupt,software interrupt。SEIE 為 Supervisor mode externel interrupt enable 的縮寫。
  w_sie(r_sie() | SIE_SEIE | SIE_STIE | SIE_SSIE);

  // configure Physical Memory Protection to give supervisor mode
  // access to all of physical memory.
作用為希望在 supervisor mode 能夠存取所有的實體記憶體位置。
  w_pmpaddr0(0x3fffffffffffffull);
  w_pmpcfg0(0xf);

  // ask for clock interrupts.
對 timer 進行初始化,為了後續的 timer interrupt。
  timerinit();

  // keep each CPU's hartid in its tp register, for cpuid().
讀取 mhartid 暫存器並將值寫入到 tp 暫存器中,tp 暫存器存放 thread pointer。這決定等一下程式碼會在哪一個處理器核心上執行。
  int id = r_mhartid();
  w_tp(id);

  // switch to supervisor mode and jump to main().
  asm volatile("mret");
}

此時會儘快從machine mode轉成supervisor mode

start使用以下步驟去轉成supervisor mode 1.it sets the previous privilege mode to supervisor in the register mstatus

  unsigned long x = r_mstatus();
  x &= ~MSTATUS_MPP_MASK;
  x |= MSTATUS_MPP_S;
  w_mstatus(x);

讀取 mstatus 暫存器的內容到變數 x 中 MSTATUS_MPP_MASK 位於 kernel/riscv.h 中,我們將 MPP 欄位的兩個 bit 歸零,接著將這兩個歸零為加上 MSTATUS_MPP_S,也就是切換到 supervisor mode,接著通過 w_mstatus 將值寫入到 mstatus 暫存器中 下圖為 mstatus 暫存器的資料欄位圖。 https://ithelp.ithome.com.tw/articles/10297602 image

2.it sets the return address to main by writing main’s address into the register mepc 將 main 的記憶體地址寫入到 mepc 暫存器中,該暫存器用途為當 trap 發生時所使用的 Program counter。

w_mepc((uint64)main);

3.disables virtual address translation in supervisor mode by writing 0 into the page-table register satp 4.delegates all interrupts and exceptions to supervisor mode 5.configure Physical Memory Protection to give supervisor mode 6.programs the clock chip to generate timer interrupts timerinit();

7.asm volatile("mret"); 中的 mret is often used to return from a previous call from supervisor mode to machine mode. 執行由 RISC-V 所提供了 mret 指令,表示當處在 machine mode 底下,用於退出 trap 的指令,而退出時硬體會做兩件事情

開始進入main function -(kernel/main.c:11)

Pin-Jiun commented 2 years ago

Function main

main (kernel/main.c:11) initializes several devices and subsystems

#include "types.h"
#include "param.h"
#include "memlayout.h"
#include "riscv.h"
#include "defs.h"

volatile static int started = 0;

// start() jumps here in supervisor mode on all CPUs.
void
main()
{
  if(cpuid() == 0){
    consoleinit();
    printfinit();
    printf("\n");
    printf("xv6 kernel is booting\n");
    printf("\n");
    kinit();         // physical page allocator
    kvminit();       // create kernel page table
    kvminithart();   // turn on paging
    procinit();      // process table
    trapinit();      // trap vectors
    trapinithart();  // install kernel trap vector
    plicinit();      // set up interrupt controller
    plicinithart();  // ask PLIC for device interrupts
    binit();         // buffer cache
    iinit();         // inode table
    fileinit();      // file table
    virtio_disk_init(); // emulated hard disk
    userinit();      // first user process
    __sync_synchronize();
    started = 1;
  } else {
    while(started == 0)
      ;
    __sync_synchronize();
    printf("hart %d starting\n", cpuid());
    kvminithart();    // turn on paging
    trapinithart();   // install kernel trap vector
    plicinithart();   // ask PLIC for device interrupts
  }

  scheduler();        
}

kinit:設置好頁表分配器(page allocator) kvminit:設置好虛擬內存,這是下節課的內容 kvminithart:打開頁表,也是下節課的內容 processinit:設置好初始進程或者說設置好進程表單 trapinit/trapinithart:設置好user/kernel mode轉換代碼 plicinit/plicinithart:設置好中斷控制器PLIC(Platform Level Interrupt Controller),用來與磁盤和console交互方式 binit:分配buffer cache iinit:初始化inode緩存 fileinit:初始化文件系統 virtio_disk_init:初始化磁盤 userinit:最後所有的設置完成了,通過userinit運行第一個process-(kernel/proc.c:226).

Pin-Jiun commented 2 years ago

First Process userinit

在進行了一些初始化之後,會執行userinit()建立第一個 process,而要深入探討 userinit() 會牽扯到 lock 以及 memory page, trapframe 等等機制,這一些將會在後續進行探討

現在先抽象的理解 userinit() 會建立第一個 process。

kernel/proc.c:226 The first process executes a small program written in RISC-V assembly

// a user program that calls exec("/init")
// assembled from ../user/initcode.S
// od -t xC ../user/initcode
uchar initcode[] = {
  0x17, 0x05, 0x00, 0x00, 0x13, 0x05, 0x45, 0x02,
  0x97, 0x05, 0x00, 0x00, 0x93, 0x85, 0x35, 0x02,
  0x93, 0x08, 0x70, 0x00, 0x73, 0x00, 0x00, 0x00,
  0x93, 0x08, 0x20, 0x00, 0x73, 0x00, 0x00, 0x00,
  0xef, 0xf0, 0x9f, 0xff, 0x2f, 0x69, 0x6e, 0x69,
  0x74, 0x00, 0x00, 0x24, 0x00, 0x00, 0x00, 0x00,
  0x00, 0x00, 0x00, 0x00
};

// Set up first user process. 
void
userinit(void)
{
  struct proc *p;

  p = allocproc();
  initproc = p;

  // allocate one user page and copy initcode's instructions
  // and data into it.
  uvmfirst(p->pagetable, initcode, sizeof(initcode));
  p->sz = PGSIZE;

  // prepare for the very first "return" from kernel to user.
  p->trapframe->epc = 0;      // user program counter
  p->trapframe->sp = PGSIZE;  // user stack pointer

  safestrcpy(p->name, "initcode", sizeof(p->name));
  p->cwd = namei("/");

  p->state = RUNNABLE;

  release(&p->lock);
}

可以發現會執行一段 initcode,而這一些看起來像是組合語言轉變成機器語言的16進位表示法,而其組合語言的表示法可以在 user/initcode.S 中看見

接著trace user/initcode.S:3


initcode.S

First Process userinit 所執行的code 此時會進入user mode loads the number for the exec system call, SYS_EXEC (kernel/syscall.h:8), into register a7 and then calls ecall to re-enter the kernel.

# Initial process that execs /init.
# This code runs in user space.

#include "syscall.h"

# exec(init, argv)
.globl start
start:
        la a0, init
        la a1, argv
        li a7, SYS_exec
        ecall

# for(;;) exit();
exit:
        li a7, SYS_exit
        ecall
        jal exit

# char init[] = "/init\0";
init:
  .string "/init\0"

# char *argv[] = { init, 0 };
.p2align 2
argv:
  .long init
  .long 0

首先將init中的地址加載到a0(la a0, init) argv中的地址加載到a1(la a1, argv) exec系統調用對應的數字加載到a7(li a7, SYS_exec) 最後調用ecall(kernel/syscall.c:133)將控制權交給OS

所以這里執行了3條指令,之後在第4條指令將控制權交給了操作系統。

用戶代碼再要進行系統調用時
將exec的參數放在 a0 和 a1 兩個寄存器中
將系統調用號放在a7中
系統調用會匹配對應的條目。(kernel/syscall.c:133)

使用gdb 設置 b syscall執行的順序如下 userinit()執行先將相關的值存入到register完後回到main() main()結束後接著執行一些程式,再跳到usertrap()

kernel/trap.c

void
usertrap(void)
{
  int which_dev = 0;

  if((r_sstatus() & SSTATUS_SPP) != 0)
    panic("usertrap: not from user mode");

  // send interrupts and exceptions to kerneltrap(),
  // since we're now in the kernel.
  w_stvec((uint64)kernelvec);

  struct proc *p = myproc();

  // save user program counter.
  p->trapframe->epc = r_sepc();

  if(r_scause() == 8){
    // system call

    if(killed(p))
      exit(-1);

    // sepc points to the ecall instruction,
    // but we want to return to the next instruction.
    p->trapframe->epc += 4;

    // an interrupt will change sepc, scause, and sstatus,
    // so enable only now that we're done with those registers.
    intr_on();

    syscall();
  } else if((which_dev = devintr()) != 0){
    // ok
  } else {
    printf("usertrap(): unexpected scause %p pid=%d\n", r_scause(), p->pid);
    printf("            sepc=%p stval=%p\n", r_sepc(), r_stval());
    setkilled(p);
  }

  if(killed(p))
    exit(-1);

  // give up the CPU if this is a timer interrupt.
  if(which_dev == 2)
    yield();

  usertrapret();
}

可以看到usertrap()使用了syscall() , 接著進入kernel/syscall.c

Pin-Jiun commented 2 years ago

function syscall

(kernel/syscall.c:133)

#include "types.h"
#include "param.h"
#include "memlayout.h"
#include "riscv.h"
#include "spinlock.h"
#include "proc.h"
#include "syscall.h"
#include "defs.h"

...

// Prototypes for the functions that handle system calls.
extern uint64 sys_fork(void);
extern uint64 sys_exit(void);
extern uint64 sys_wait(void);
extern uint64 sys_pipe(void);
extern uint64 sys_read(void);
extern uint64 sys_kill(void);
extern uint64 sys_exec(void);
extern uint64 sys_fstat(void);
extern uint64 sys_chdir(void);
extern uint64 sys_dup(void);
extern uint64 sys_getpid(void);
extern uint64 sys_sbrk(void);
extern uint64 sys_sleep(void);
extern uint64 sys_uptime(void);
extern uint64 sys_open(void);
extern uint64 sys_write(void);
extern uint64 sys_mknod(void);
extern uint64 sys_unlink(void);
extern uint64 sys_link(void);
extern uint64 sys_mkdir(void);
extern uint64 sys_close(void);

// An array mapping syscall numbers from syscall.h
// to the function that handles the system call.
static uint64 (*syscalls[])(void) = {
[SYS_fork]    sys_fork,
[SYS_exit]    sys_exit,
[SYS_wait]    sys_wait,
[SYS_pipe]    sys_pipe,
[SYS_read]    sys_read,
[SYS_kill]    sys_kill,
[SYS_exec]    sys_exec,
[SYS_fstat]   sys_fstat,
[SYS_chdir]   sys_chdir,
[SYS_dup]     sys_dup,
[SYS_getpid]  sys_getpid,
[SYS_sbrk]    sys_sbrk,
[SYS_sleep]   sys_sleep,
[SYS_uptime]  sys_uptime,
[SYS_open]    sys_open,
[SYS_write]   sys_write,
[SYS_mknod]   sys_mknod,
[SYS_unlink]  sys_unlink,
[SYS_link]    sys_link,
[SYS_mkdir]   sys_mkdir,
[SYS_close]   sys_close,
};

void
syscall(void)
{
  int num;
  struct proc *p = myproc();

  num = p->trapframe->a7;
  if(num > 0 && num < NELEM(syscalls) && syscalls[num]) {
    // Use num to lookup the system call function for num, call it,
    // and store its return value in p->trapframe->a0
    p->trapframe->a0 = syscalls[num]();
  } else {
    printf("%d %s: unknown sys call %d\n",
            p->pid, p->name, num);
    p->trapframe->a0 = -1;
  }
}

The system call table

static uint64 (*syscalls[])(void) = {

maps SYS_EXEC to sys_exec

num = p->trapframe->a7會讀取使用的系統調用對應的整數。 當代碼執行完這一行之後,我們可以在gdb中p num,可以看到是7。 如果我們查看syscall.h,可以看到7對應的是exec系統調用。

// System call numbers
#define SYS_fork    1
#define SYS_exit    2
#define SYS_wait    3
#define SYS_pipe    4
#define SYS_read    5
#define SYS_kill    6
#define SYS_exec    7
#define SYS_fstat   8
#define SYS_chdir   9
#define SYS_dup    10
#define SYS_getpid 11
#define SYS_sbrk   12
#define SYS_sleep  13
#define SYS_uptime 14
#define SYS_open   15
#define SYS_write  16
#define SYS_mknod  17
#define SYS_unlink 18
#define SYS_link   19
#define SYS_mkdir  20
#define SYS_close  21

exec replaces the memory and registers of the current process with a new program (in this case, /init). 所以這里本質上是告訴kernel,某個用戶應用程序執行了ecall指令,並且想要調用exec

之後syscall和 kernel/trap.c 的usertrap來回執行

if(num > 0 && num < NELEM(syscalls) && syscalls[num]) {...

p->trapframe->a0 = syscalls[num]();

執行完會印出init: starting sh

it returns to user space in the /init process. Init (user/init.c:15) creates a new console device file if needed and then opens it as file descriptors 0, 1, and 2. Then it starts a shell on the console. The system is up.

System call exec() 執行了init(),而 init() 會替換掉目前的記憶體內容以及暫存器 (在 System call 的 exec() 中有提及此部分),kernel 完成 exec() 後,就會回到 init() 中,init() 位於 user/init.c,可以看到他會開啟 console ,console 會和檔案描述子 0,1,2 關聯在一起,我們就能夠在 console 上面進行輸入,並且在 console 上面看到輸出,接著在 console 上面開啟 Shell,到這裡整個開機流程就完成了。

image

Pin-Jiun commented 2 years ago

volatile

volatile 為 C 語言中的關鍵字,volatile 表示該物件或是變數據有一些最佳化或是多執行續的特性,在本例子中即是多執行續的特性,volatile 告知編譯器該變數可能會隨時進行變動,因此在讀取或是儲存該變數時,應該對其變數地址進行操作,如果沒有 volatile 關鍵字修飾變數或是物件,可能在編譯器最佳化的過程,讀取該變數不是從變數的記憶體地址,而是從暫存器中進行讀取,導致一些不一致的情況發生,例如有一個外部硬體或是程式對某一個變數進行更動,而由於程式讀取暫存器內容,因此沒有讀取到該更動。

Pin-Jiun commented 2 years ago

Q.Looking at the backtrace output, which function called syscall?

image usertrap () at kernel/trap.c:67

Type n a few times to step pass struct proc p = myproc(); Once past this statement, type p /x p, which prints the current process's proc struct (see kernel/proc.h>) in hex.

What is the value of p->trapframe->a7 and what does that value represent?

(Hint: look user/initcode.S, the first user program xv6 starts.)