IUCompilerCourse / Essentials-of-Compilation

A book about compiling Racket and Python to x86-64 assembly
1.27k stars 137 forks source link

`patch_instructions`? #156

Closed Antigen-1 closed 1 year ago

Antigen-1 commented 1 year ago

Hello!

I'm just a reader of this book and a complete novice at designing compilers.

According to page 32,

The patch_instructions pass compiles from x86var to x86int by making sure that each instruction adheres to the restriction that at most one argument of an instruction may be a memory reference.

That means if we initialize an variable using another variable, then the compiler will produce movq <ref> <ref>, which is not allowed in x86.

In order to resolve this, actually we can implement a partial evaluator for the Cvar intermediate language rather than the Lvar language. That certainly eliminates this condition from the sequence, rendering the patch_instructions pass unnecessary. And I found it much easier than overcoming the challenge described in section 2.11.

Here is my optimizer(Yet I adopt another representation of the Cvar language to make it easier to write the definitional interpreter and enforce contracts).

  #; (-> Cvar Cvar)
  ;; Cvar: (list 'program (list <symbol> (list <statement> ...)) ...)
  ;; assignment: (list 'define <symbol> <expression>)
  ;; tail: (list 'return <expression>)
  (define (opt:partial-evaluation form)
    ;;simplify `expr` in terms of `table`
    (define (simplify expr table)
      (define (reference p)
        (cond ((symbol? p) (hash-ref table p #f)) (else p)))
      (define-syntax (handle stx)
        (syntax-case stx ()
          ((_ h p ...)
           (with-syntax (((np ...) (map generate-temporary (syntax->datum #'(p ...)))))
             #'(let ((np (reference p)) ...)
                 (if (and np ...)
                     (h np ...)
                     (cons 'h
                           (map
                            (lambda (r o) (or r o))
                            (list np ...) (list p ...)))))))))
      (match expr
        ((list '+ p1 p2)
         (handle + p1 p2))
        ((list '- p)
         (handle - p))
        ((list '- p1 p2)
         (handle - p1 p2))
        ((list 'read)
         (list 'read))
        (p (cond ((reference p)) (else p)))))
    ;;optimize `statement` in terms of `table`
    (define (optimize statement table)
      (cond ((tail? statement)
             (list 'return (simplify (cadr statement) table)))
            (else
             (define r (simplify (caddr statement) table))
             (if (or (symbol? r) (fixnum? r))
                 (hash-set table (cadr statement) r)
                 (list 'define (cadr statement) r)))))
    (cons 'program
          (map
           (lambda (block)
             (list (car block)
                   (let-values (((t r)
                                 (for/fold ((table (hasheq)) (result null))
                                           ((statement (in-list (cadr block))))
                                   (define r (optimize statement table))
                                   (cond ((hash? r) (values r result))
                                         (else (values table (cons r result)))))))
                     (reverse r))))
           (cdr form))))

Nevertheless, I feel unqualified to remark on curriculum design and this is merely my suggestion.

Yours sincerely Antigen-1

jsiek commented 1 year ago

Dear Antigen-1,

I like your creative thinking!

What you're proposing may not be a simple as it appears at first. There are a couple things to consider:

  1. Does your optimization properly handle programs that include lots of side effects, such as calls to (read)? In particular, you would need to make sure that the ordering of the side effects is unchanged by the optimization.
  2. In later Chapters we add conditionals and loops. In that more general setting, does this optimization always succeed at removing variables, or does it sometimes fail? (If it sometimes fails, then patch_instructions is still need.)

Happy exploring, Jeremy

Antigen-1 commented 1 year ago

Dear Jeremy,

Thanks for your reply!

First and foremost, it is unnecessary for typical partial evaluators to remove (define <var> <var>) from the sequence when the value of the latter variable is unknown. Though I do implement this feature and the optimizer can handle correctly the order of complex expressions, it still appears to be unnatural, and that is exactly why I closed this issue yesterday.

As for later chapters, the optimizer, in fact, might still work if conditionals and loops are converted into Cvar statements. The structure of the Cvar language reminds me of modules that contain code blocks, say, procedures, submodules, macros and the like. Maybe I can add clauses to the match form to extend the optimizer.

It's appreciated again for your kind suggestions. I'll continue to improve my compiler, taking these factors into consideration!

Yours frankly! Antigen-1

Antigen-1 commented 1 year ago

What you're proposing may not be a simple as it appears at first.

I think you are right, sir. The optimizer cannot remove all (define <var> <var>). Worse still, it fails to undertake as much computation as possible at the compile time.