Migrating from Chibicc - Githubissues

rochus-keller commented 1 month ago

The Type, Obj and Node data structures seem to have a pretty different semantics in widcc than in chibicc.

What is the correct way to :

iterate over all parameters of a function,
iterate over all local variables of a function,
iterate over all actual args (Node and expression per actual argument) of a ND_FUNCALL node?

fuhsnn commented 1 month ago

For 1. 2.

static void visit_params(Obj *var) {
  for (; var; var = var->param_next) {
    if (var->name)
      printf("param: %s\n", var->name);
    else
      printf("unnamed param of type %d\n", var->ty->kind);
  }
}
static void visit_vars(Scope *sc) {
  for (Obj *var = sc->locals; var; var = var->next) {
    if (var->name)
      printf("lvar: %s\n", var->name);
    else
      printf("unnamed lvar of type %d\n", var->ty->kind);
  }

  for (Scope *sub_sc = sc->children; sub_sc;) {
    visit_vars(sub_sc);
    sub_sc = sub_sc->sibling_next;
  }
}

static void dump_function_vars(Obj *prog) {
  for (Obj *fn = prog; fn; fn = fn->next) {
    if (!fn->is_function || !fn->is_definition)
      continue;
    visit_params(fn->ty->param_list);
    visit_vars(fn->ty->scopes);
  }
}

For 3. I think I can simplify the layout a bit

fuhsnn commented 1 month ago

https://github.com/fuhsnn/widcc/blob/85ec382abf32524a81fae58caa5d43973e1f15f1/parse.c#L3039-L3041 The assignment node store evaluation result into the temporary lvar, then later place_stack_args() place_reg_args() copy the stored results into calling convention specified places.

You can drop the assign node and make dummy Obj*s that don't allocate lvar space if the backend have better solutions.

rochus-keller commented 1 month ago

Great, thank you very much for the information and the new commit. I try to refactor my code generators accordingly.

rochus-keller commented 1 month ago

Meanwhile I was able to integrate the mentioned commit and adapt my code (https://github.com/rochus-keller/EiGen/tree/widcc); I currently test with the nolib test cases and still get a few segfaults and code generator asserts which I have to investigate; so I assume that I haven't yet properly considered all of your design changes.

May I ask whether and where you allocate an extra parameter in case of struct/union function returns? I also haven't understood why you need an extra list for param_promoted and how it relates to the param_next list.

rochus-keller commented 1 month ago

The segfaults seem to be associated with gotos and switch cases. As it seem, I would have to invest quite a lot of time in debugging, which is too much for me at the moment. I will therefore continue with Libfirm for the time being and come back to the widcc branch later.

fuhsnn commented 1 month ago

extra parameter in case of struct/union function returns

The pointer is saved on stack space without making an extra parameter Obj*: https://github.com/fuhsnn/widcc/blob/a0bb0108ab4f6ed97556400b7bf298ec49bdc820/codegen.c#L1628-L1634 Because return struct pointer being the first parameter is x86 sys-V specific, I believe this makes the frontend more platform-agnostic.

why you need an extra list for param_promoted

It's for old style functions, because their arguments are always default-promoted. The calling convention use default-promoted types; and callee cast received parameter from promoted type back to locally declared types. For example: https://godbolt.org/z/zTEndbnfb A cast instruction is emitted because float are always passed as double in old-style.

Libfirm/cparser

I'd love to see it maintained up-to-date, at least not failing on python versions (bit funny for a c99 project). There was another one long forgotten - Open64, I always feel like getting it working for historical preservation.

rochus-keller commented 1 month ago

Thanks for the information.

Meanwhile I was able to build my Awfy reference project with cparser and run benchmarks with different optimization levels. Here are the detailed results.

The executable generated by cparser 1.22.1 -O2 is 8% slower than the one generated with gcc 4.8 -O2, but the build time with cparser is 4.3 times slower than with gcc. In the unoptimized case the executable generated with cparser is 36% faster (!) than the one generated with gcc, but the build time with cparser is still 3.7 times slower than with gcc. So cparser produces very fast code, but takes a lot of time for it.

I already run a subset of Awfy with my chibicc version with Eigen backend (which does no optimization at all) and found that the generated executable is 3 to 4 times slower than the unoptimized gcc generated executable. I will try to build the Boem GC with my chibicc version and then run the full Awfy to enable fair comparison.

at least not failing on python versions

I was able to build it by using the last commit of libfirm before they switched to Python 3, just using Python 2 instead. I agree that it is pretty silly to add a Python dependency to the C compiler; there is actually also a Perl dependency to generate parts of the backend code.

rochus-keller commented 1 month ago

Just had a look at Open64 (https://en.wikipedia.org/wiki/Open64); apparently they used GCC as a frontend, so it's not interesting for me (GCC is a monster).

rochus-keller commented 1 month ago

Meanwhile I managed to integrate the Boem gc and run the full awfy. The detail results are in the referenced report. The chibicc/ecc generated code runs 3.5 times slower than the executable generated from the same code using GCC -O2.

Maybe you want to run performance measurements yourself using widcc. Just in case, here is the source code. When you run the resulting executable, a report is printed to the console.

fuhsnn commented 1 month ago

3.5 times slower than the executable generated from the same code using GCC -O2.

It's... not unexpected, there are tons of dirty hacks in chibicc just to do correct things the simplest way, (I follow that direction in widcc, and only patched some out in slimcc), it's kind of a "so I'm doing this in one way, that one way must account for the worst-case, forgo the others" attitude, for example (num += 7) is strictly worse than (num = num + 7) because the former introduced an extra temporary just to account for the case that the operands may have side effect (transforming to (a = a + b) would evaluate "a" twice), introducing push-pop nodes in AST would help, but in that philosophy that's considered unnecessary complexity.

And there's the volatile thing, chibicc omits the volatile qualifier, which means all memory operation in user code must be implicitly volatile to not break stuff in subtle ways (cproc/qbe does this too), therefore... in all honestly, even if you hook LLVM to chibicc the best it can do will still be worse than clang -O1.

fuhsnn / widcc

Migrating from Chibicc #1