cmu-db / bustub

The BusTub Relational Database Management System (Educational)
https://15445.courses.cs.cmu.edu/
MIT License
3.83k stars 1.74k forks source link

Suggestion: Give students Kaitai Struct grammar for viewing binary page data as structured object trees #263

Open GavinRay97 opened 1 year ago

GavinRay97 commented 1 year ago

This is really useful for debugging, especially when you're trying to implement serialization/deserialization logic.

Kaitai Struct is a tool that allows you to write declarative grammars for binary formats. One of the tools it provides is a hex-based visualizer available in browser:

I wrote a grammar for the page format based on the description in the doc comments. You can try it by visiting the ide.kaitai.io link, pasting the below definition, and uploading a page file to the hex editor.

Hope other folks find this as useful as I have =)

https://user-images.githubusercontent.com/26604994/184516016-9b909a9c-47a4-4437-b313-123eed2d016e.mp4

meta:
  id: bustub_page
  file-extension: db
  endian: le

seq:
  - id: header
    type: page_header

  - id: slots
    type: tuple_slot
    repeat: expr
    repeat-expr: 'header.tuple_count'

  - id: padding
    type: u2
    repeat: until
    repeat-until: _ != 0

  - id: tuples
    type: tuple
    repeat: eos

types:
  page_header:
    seq:
      - id: page_id
        type: u4

      - id: log_storage_number
        type: u4

      - id: previous_page_id
        type: u4

      - id: next_page_id
        type: u4

      - id: free_space_pointer
        type: u4

      - id: tuple_count
        type: u4

  tuple_slot:
    seq:
      - id: offset
        type: u4

      - id: size
        type: u4

  tuple:
    seq:
      - id: len
        type: u4

      - id: data
        size: len
skyzh commented 1 year ago

Looks pretty cool to me! I’ll look into possibilities of integrating this tool.

GavinRay97 commented 1 year ago

I discovered that using repeat-until: _ != 0 is fragile. Katai supports a more complex form of expression where you can say where the position should start from, so this works much better:

meta:
    id: heap_page
    file-extension: db
    endian: le

seq:
    - id: header
      type: page_header

    - id: slots
      type: slot
      repeat: expr
      repeat-expr: "header.num_records"

instances:
    records:
        pos: "header.free_space_end"   # <---
        type: record
        repeat: expr
        repeat-expr: "header.num_records"  # <---