bobrippling / ucc-c-compiler

A C-implemented C compiler
MIT License
89 stars 13 forks source link

A C Compiler written in C

Github Build Travis Build

Dependencies

Features

The compiler implements C89, C99 and C11 (controllable via -std=c89/c99/c11). System libraries are fully supported, including ABI compatibility with constructs such as va_list. There are some major additions, listed below:

Extensions

Lambdas:

^(parameters, ...) { body }
^T (parameters, ...) { body }
^T { body }
^ { body }

The syntax for lambdas/blocks is similar to that in Objective-C. Closing over external variables isn't implemented yet, nor is the __block keyword.

Other forms with explicit return types, omitted parameters, and omitted parameters and return types are allowed.

When the return type is omitted, the return type is inferred from the first return statement in the body, or void, if there are none. The result of the expression is a function block pointer (T (^)(Args...)), explicitly convertible to a function pointer.

Namespace checking:

#pragma ucc namespace expr_

This ensures that any declarations after this pragma begin with expr_, allowing you to enforce a namespace exported by each translation unit (.c file).

See namespace.c for an example.

GNU C Supported Extensions

Summarised from the GNU C Extensions page.

Supported Extension Name Description
✅ Yes Statement Exprs Putting statements and declarations inside expressions.
✅ Yes Local Labels Labels local to a block.
✅ Yes Labels as Values Getting pointers to labels, and computed gotos.
❌ No Nested Functions Nested function in GNU C.
✅ Yes Nonlocal Gotos Nonlocal gotos.
❌ No Constructing Calls Dispatching a call to another function.
✅ Yes Typeof typeof: referring to the type of an expression.
✅ Yes Conditionals Omitting the middle operand of a ?: expression.
❌ No __int128 128-bit integers-__int128.
🖥️ 64-bit targets only Long Long Double-word integers-long long int.
❌ No Complex Data types for complex numbers.
❌ No Floating Types Additional Floating Types.
❌ No Half-Precision Half-Precision Floating Point.
❌ No Decimal Float Decimal Floating Types.
✅ Yes Hex Floats Hexadecimal floating-point constants.
❌ No Fixed-Point Fixed-Point Types.
❌ No Named Address Spaces Named address spaces.
✅ Yes Zero Length Zero-length arrays.
✅ Yes Empty Structures Structures with no members.
✅ Yes Variable Length Arrays whose length is computed at run time.
❌ No Variadic Macros Macros with a variable number of arguments. #define f(a, b...) ...
✅ Yes Escaped Newlines Slightly looser rules for escaped newlines.
❌ No Subscripting Any array can be subscripted, even if not an lvalue. (This is intentionally not supported)
✅ Yes Pointer Arith Arithmetic on void-pointers and function pointers.
✅ Yes Variadic Pointer Args Pointer arguments to variadic functions.
✅ Yes Pointers to Arrays Pointers to arrays with qualifiers work as expected.
✅ Yes Initializers Non-constant initializers.
✅ Yes Compound Literals Compound literals give structures, unions or arrays as values.
✅ Yes Designated Inits Labeling elements of initializers.
✅ Yes Case Ranges case 1 ... 9 and such.
❌ No Cast to Union Casting to union type from any member of the union.
✅ Yes Mixed Declarations Mixing declarations and code.
🔎 Partial Function Attributes Declaring that functions have no side effects, or that they can never return.
🔎 Partial Variable Attributes Specifying attributes of variables.
🔎 Partial Type Attributes Specifying attributes of types.
✅ Yes Label Attributes Specifying attributes on labels.
✅ Yes Enumerator Attributes Specifying attributes on enumerators.
✅ Yes Statement Attributes Specifying attributes on statements. __attribute__((fallthrough));
✅ Yes Attribute Syntax Formal syntax for attributes.
✅ Yes Function Prototypes Prototype declarations and old-style definitions.
✅ Yes C++ Comments C++ comments are recognized.
✅ Yes Dollar Signs Dollar sign is allowed in identifiers.
✅ Yes Character Escapes \e stands for the character ESC.
✅ Yes Alignment Determining the alignment of a function, type or variable.
✅ Yes Inline Defining inline functions (as fast as macros).
✅ Yes Volatiles What constitutes an access to a volatile object.
🛠️ asm WIP Using Assembly Language with C Instructions and extensions for interfacing C with assembler.
✅ Yes Alternate Keywords __const__, __asm__, etc., for header files.
✅ Yes Incomplete Enums enum foo;, with details to follow.
✅ Yes Function Names Printable strings which are the name of the current function.
✅ Yes Return Address Getting the return or frame address of a function.
❌ No Vector Extensions Using vector instructions through built-in functions.
✅ Yes Offsetof Special syntax for implementing offsetof.
❌ No __sync Builtins Legacy built-in functions for atomic memory access.
❌ No __atomic Builtins Atomic built-in functions with memory model.
🔎 Partial Integer Overflow Builtins Built-in functions to perform arithmetics and arithmetic overflow checking.
❌ No x86 specific memory model extensions for transactional memory x86 memory models.
❌ No Object Size Checking Built-in functions for limited buffer overflow checking.
🔎 Partial Other Builtins Other built-in functions.
❌ No Target Builtins Built-in functions specific to particular targets.
❌ No Target Format Checks Format checks specific to particular targets.
❌ No Pragmas Pragmas accepted by GCC.
✅ Yes Unnamed Fields Unnamed struct/union fields within structs/unions.
🛠️ TLS WIP Thread-Local Per-thread variables.
✅ Yes Binary constants Binary constants using the 0b prefix.

Output/Targets

ucc can generate x86_64 assembly, and had partial support for MIPS, but that's unmaintained at the moment. There are plans to add arm too. The code generator can target Linux-, Cygwin- and Darwin-based toolchains (handling differences in PLT calls, leading underscores, stack alignment, etc)

Constant folding and some small amount of optimisation is done, but nothing heavy (the feature/ir branch plans to change this).

The ABI matches GCC and Clang's, or more specifically, the System V x86-64 psABI (modulo bugs, of which there is currently one - see nested_ret.c).

ucc can also dump its AST, similarly to clang, with -emit=dump.

Building

make

If you plan on building the shim libc, or customising CFLAGS:

./configure [CC=...] [CFLAGS=...] [LDFLAGS=...]

Installing

ucc doesn't have a make install target yet. When run locally, ucc will use its own include files for stdarg.h, etc, but otherwise will use system includes and libraries.

Compiling C files

POSIX 'cc' standard arguments, plus many additions, see ./ucc --help for details.

Todo

Limitations/Known Bugs

Examples

./ucc -o hello hello.c

./ucc -o- -S test.c

./ucc -o- -S -emit=dump test.c

./ucc -c test.c

./ucc -c test.s

./ucc test.c a.o -o out b.a

./ucc a.o b.c -E