arnetheduck / nlvm

LLVM-based compiler for the Nim language
Other
701 stars 41 forks source link
compiler language llvm nim

Introduction

nlvm (the nim-level virtual machine?) is an LLVM-based compiler for the Nim programming language.

From Nim's point of view, it's a backend just like C or JavaScript - from LLVM's point of view, it's a language frontend that emits IR.

Questions, patches, improvement suggestions and reviews welcome. When you find bugs, feel free to fix them as well :)

Fork and enjoy!

Jacek Sieka (arnetheduck on gmail point com)

Features

nlvm works as a drop-in replacement for nim with the following notable differences:

Most things from nim work just fine (see the porting guide below!):

Test coverage is not too bad either:

How you could contribute:

nlvm does not:

Compile instructions

To do what I do, you will need:

Start with a clone:

cd $SRC
git clone https://github.com/arnetheduck/nlvm.git
cd nlvm && git submodule update --init

We will need a few development libraries installed, mainly due to how nlvm processes library dependencies (see dynlib section below):

# Fedora
sudo dnf install pcre-devel openssl-devel sqlite-devel ninja-build cmake

# Debian, ubuntu etc
sudo apt-get install libpcre3-dev libssl-dev libsqlite3-dev ninja-build cmake

Compile nlvm (if needed, this will also build nim and llvm):

make

Compile with itself and compare:

make compare

Run test suite:

make test
make stats

You can link statically to LLVM to create a stand-alone binary - this will use a more optimized version of LLVM as well, but takes longer to build:

make STATIC_LLVM=1

If you want a faster nlvm, you can also try the release build - it will be called nlvmr:

make STATIC_LLVM=1 nlvmr

When you update nlvm from git, don't forget the submodule:

git pull && git submodule update

To build a docker image, use:

make docker

To run built nlvm docker image use:

docker run -v $(pwd):/code/ nlvm c -r /code/test.nim

Compiling your code

On the command line, nlvm is mostly compatible with nim.

When compiling, nlvm will generate a single .o file with all code from your project and link it using $CC - this helps it pick the right flags for linking with the C library.

cd $SRC/nlvm/Nim/examples
../../nlvm/nlvm c fizzbuzz

If you want to see the generated LLVM IR, use the -c option:

cd $SRC/nlvm/Nim/examples
../../nlvm/nlvm c -c fizzbuzz
less fizzbuzz.ll

You can then run the LLVM optimizer on it:

opt -Os fizzbuzz.ll | llvm-dis

... or compile it to assembly (.s):

llc fizzbuzz.ll
less fizzbuzz.s

Apart from the code of your .nim files, the compiler will also mix in the compatibility found library in nlvm-lib/.

Pipeline

Generally, the nim compiler pipeline looks something like this:

nim --> c files --> IR --> object files --> linker --> executable

In nlvm, we remove one step and bunch all the code together:

nim --> single IR file --> built-in LTO linker --> executable

Going straight to the IR means it's possible to express nim constructs more clearly, allowing llvm to understand the code better and thus do a better job at optimization. It also helps keep compile times down, because the c-to-IR step can be avoided.

The practical effect of generating a single object file is similar to clang -fwhole-program -flto - it is a bit more expensive in terms of memory, but results in slightly smaller and faster binaries. Notably, the IR-to-machine-code step, including any optimizations, is repeated in full for each recompile.

Porting guide

dynlib

nim uses a runtime dynamic library loading scheme to gain access to shared libraries. When compiling, no linking is done - instead, when running your application, nim will try to open anything the user has installed.

nlvm does not support the {.dynlib.} pragma - instead you can use {.passL.} using normal system linking.

# works with `nim`
proc f() {. importc, dynlib: "mylib" .}

# works with both `nim` and `nlvm`
{.passL: "-lmylib".}
proc f() {. importc .}

{.header.}

When nim compiles code, it will generate c code which may include other c code, from headers or directly via emit statements. This means nim has direct access to symbols declared in the c file, which can be both a feature and a problem.

In nlvm, {.header.} directives are ignored - nlvm looks strictly at the signature of the declaration, meaning the declaration must exactly match the c header file or subtly ABI issues and crashes ensue!


# When `nim` encounters this, it will emit `jmp_buf` in the `c` code without
# knowing the true size of the type, letting the `c` compiler determine it
# instead.
type C_JmpBuf {.importc: "jmp_buf", header: "<setjmp.h>".} = object

# nlvm instead ignores the `header` directive completely and will use the
# declaration as written. Failure to correctly declare the type will result
# in crashes and subtle bugs - memory will be overwritten or fields will be
# read from the wrong offsets.
#
# The following works with both `nim` and `nlvm`, but requires you to be
# careful to match the binary size and layout exactly (note how `bycopy`
# sometimes help to further nail down the ABI):

when defined(linux) and defined(amd64):
  type
    C_JmpBuf {.importc: "jmp_buf", bycopy.} = object
      abi: array[200 div sizeof(clong), clong]

# In `nim`, `C` constant defines are often imported using the following trick,
# which makes `nim` emit the right `C` code that the value from the header
# can be read (no writing of course, even though it's a `var`!)
#
# assuming a c header with: `#define RTLD_NOW 2`
# works for nim:
var RTLD_NOW* {.importc: "RTLD_NOW", header: "<dlfcn.h>".}: cint

# both nlvm and nim (note how these values often can be platform-specific):
when defined(linux) and defined(amd64):
  const RTLD_NOW* = cint(2)

{.emit.}

To deal with emit, the recommendation is to put the emitted code in a C file and {.compile.} it.

proc myEmittedFunction() {.importc.}
{.compile: "myemits.c".}
void myEmittedFunction() {
  /* ... */
}

{.asm.}

Similar to {.emit.}, {.asm.} functions must be moved to a separate file and included in the compilation with {.compile.} - this works both with .S and .c files.

wasm32 support

Use --cpu:wasm32 --os:standalone --gc:none to compile Nim to (barebones) WASM.

You will need to provide a runtime (ie WASI) and use manual memory allocation as the garbage collector hasn't yet been ported to WASM and the Nim standard library lacks WASM / WASI support.

To compile wasm files, you will thus need a panicoverride.nim - a minimal example looks like this and discards any errors:

# panicoverride.nim
proc rawoutput(s: string) = discard
proc panic(s: string) {.noreturn.} = discard

After placing the above code in your project folder, you can compile .nim code to wasm32:

# myfile.nim
proc adder*(v: int): int {.exportc.} =
  v + 4
nlvm c --cpu:wasm32 --os:standalone --gc:none --passl:--no-entry myfile.nim
wasm2wat -l myfile.wasm

Most WASM-compile code ends up needing WASM extensions - in particular, the bulk memory extension is needed to process data.

Extensions are enabled by passing --passc:-mattr=+feature,+feature2, for example:

nlvm c --cpu:wasm32 --os:standalone --gc:none --passl:--no-entry --passc:-mattr=+bulk-memory

Passing --passc:-mattr=help will print available features (only works while compiling, for now!)

To use functions from the environment (with importc), compile with --passl:-Wl,--allow-undefined.

REPL / running your code

nlvm supports directly running Nim code using just-in-time compilation:

# Compile and run `myfile.nim` without creating a binary first
nlvm r myfile.nim

This mode can also be used to run code directly from the standard input:

$ nlvm r
.......................................................
>>> log2(100.0)
stdin(1, 1) Error: undeclared identifier: 'log2'
candidates (edit distance, scope distance); see '--spellSuggest':
 (2, 2): 'low' [proc declared in /home/arnetheduck/src/nlvm/Nim/lib/system.nim(1595, 6)]
...
>>> import math
.....
>>> log2(100.0)
6.643856189774724: float64

Random notes