A stack based programming language I designed based on Forth and Porth from the Porth programming language series of Tsoding Daily channel. The language is quite similar to Porth, however there are several differences. The compiler was written in Python first, but then rewrote it using Corth language itself. It is now a self hosted language.
The repo consists of the compiler source code in Corth, an already compiled and assembled compiler, standard library and examples.
To compile a Corth program to an ELF64 executable, cd to the directory that contains the corth executable and run:
./corth compile <file-name> -i ./std/
This will create an executable called output which can be directly run.
The compiler can be bootstrapped using the bootstrap
subcommand:
./corth bootstrap ./compiler/ --std ./std/
This will compile the compiler source code and place it at ./corth.
// From ./examples/hello_world.corth
include "linux_86x/stdio.corth"
proc main
int int -> int
in let argc argv in
"Hello, World!\n" puts
end 0 end
./corth compile ./examples/hello_world.corth -i ./std.corth && ./output
include
keyword is used to include the library stdio, which contains some basic I/O procedures and macros for writing to/from files and the standard streams.let
keyword is used to 'name' values. In this example, it is used to name the parameter values.proc
keyword is used to define a procedure. main
is where the program starts.puts
is defined in the stdio library, which prints the string that is passed to it in the standard output.// Push and pop direction of the stack is kept on the right side in this document.
// This will hopefully help understand the concept of stack and 'let' keyword.
34 // Stack = { 34 }
0b101001 // Stack = { 34, 41 }
0o205126 // Stack = { 34, 41, 68182 }
0x5729da // Stack = { 34, 41, 68182, 5712346 }
'a' // Stack = { 34, 41, 68182, 5712346, 97 }
'\n' // Stack = { 34, 41, 68182, 5712346, 97, 10 }
0b
, 0o
and 0x
respectively.\n
are supported in characters."This is a string" // Stack = { 0x648a15, 16 }
"This also is a string.
In Corth, multi-line strings are supported." // Stack = { 0x648a15, 16, 0x648a26, 71 }
34 35 + // Stack = { 69 }
69 27 - // Stack = { 69, 42 }
68 inc // Stack = { 69, 42, 69 }
43 dec // Stack = { 69, 42, 69, 42 }
3 23 * // Stack = { 69, 42, 69, 42, 69 }
85 2 / // Stack = { 69, 42, 69, 42, 69, 42 }
+
sums the last two items and pushes the result back, -
subtracts signed or unsigned integers.*
multiplies and /
divides signed integers. Right now, the compiler does not keep track of the integer 'signeded-ness', so every signed and unsigned operation can be used on any integer type. The unsigned versions of \*
and /
are u\*
and u/
.inc
and dec
are macros defined as 1 +
and 1 -
in core/arithmetic.corth. They can be used to increment or decrement a number once.include "str.corth"
include
keyword, it starts to compile the file whose path is given after the include
keyword.1 2 3 // Stack = { 1, 2, 3 }
swp // Stack = { 1, 3, 2 }
drop // Stack = { 1, 3 }
dup // Stack = { 1, 3, 3 }
rot // Stack = { 3, 3, 1 }
arot // Stack = { 1, 3, 3 }
drop
removes the last item from the stack.dup
duplicates the last item in the stack.swp
swaps the places of the last two items.rot
rotates the places of the last 3 items, by moving the first added to the last position.arot
does the exact opposite of what rot
does.let
instead of these macros."Hello, world!\n" puts // Prints 'Hello, world!' to the standard output.
34 35 + eputu " is a nice number.\n" eputs // Prints '69 is a nice number.' to the standard error.
proc arithmetic-average // The name of the procedure is 'arithmetic-average'.
// Procedure takes two integers as arguments, and returns a single integer.
// The leftmost type is the oldest item in the stack.
int int -> int
in
// This is where the code is located.
+ 2 /
end
// This procedure will be run.
proc main
// Right now, only 'int int -> int' argument layout is allowed for the main procedure.
int int -> int
in let argc argv in
"The arithmetic average of 53 and 31 is " puts 53 31 arithmetic-average putu ".\n" puts
end 0 end // Program exits with exit code 0.
proc
defines a procedure, which can be called anywhere in the code.return
can be used to early return from a procedure, but the stack must match with the procedure's output layout.main
procedure; if it is not defined, the compiler will return an error.macro sayHi
// Name of the macro is 'sayHi'. This means when the compiler sees 'sayHi' anywhere in the code, it will convert it to these.
// Takes a string, the name and prints a welcome message.
"Hi, " puts puts "!\n" puts
endmacro
proc main
int int -> int
in let argc argv in
"Josh"
sayHi // This will be converted to this:
// "Hi, " puts puts "!\n" puts
end 0 end
macro
keyword is used to define macros and endmacro
is used to end the definition.let
inside a macro is usually a bad idea, although some library macros are designed that way (like dup
, swp
or rot
). If the code requires let
; either change that macro to a procedure, or name the let variable with names that starts and ends with underscores (_).memory
), it can be reached from anywhere meaning any procedure in and out of the same file.If a name is defined locally in a procedure, it can only be reached within the scope that the statement it is in. For example:
// 'x' is undefined here.
69 let x in
// 'x' is defined here.
end
// 'x' is undefined here.
// This is a line comment, it can also come after code
34 35 + putu // Just like that
/*
This is a block comment, aka a multi-line comment.
Block comments can span several lines.
*/
include "linux_x86/stdio.corth"
proc main
int int -> int
in let argc argv in
2 2 + 5 = if
"Well, math is broken. Nice.\n" puts // Hopefully won't be printed.
end
3 4 > if
"Your computer has virus\n" puts // Hopefully won't be printed.
else
"Your computer is alright\n" puts // Hopefully will be printed.
end
// Count from 0 to 9.
"First 10 numbers from 0 are,\n" puts
0 while dup 10 < do // Duplicate the number, and check if it is less then 10.
dup putu "\n" puts // Print the number.
inc end drop // Increase the number.
end 0 end
if
is used for conditions, it runs the code after it only if the last item on the stack is true.if
can be used with else
for more functionality, which runs only if the bool is false.if-else
statement can be used like a ternary operator in other languages.while
is used for loops, and break
can be used to quit the loop early.if
is used without an else
, the stack must not change between if
and end
since otherwise it will create an ambiguity of what the stack will be after that part.if
is used with an else
, the stack change between if
and else
must be the same for the code between else
and end
.while
and end
.else if/elif
statements at the moment. // The rightmost variable collects the newest item in the stack. So x = 3 and y = 4.
// From now on, x will be replaced with 3 and y will be replaced with 4.
// If 'let' variables are compile-time constants, the variables will be directly replaced for optimization reasons.
// Otherwise, the stack values will be stored in local memory and loaded inside the structure.
3 4 let x y in
x y * x + y -
end
// Stack = { 10 }
let
keyword stores stack items in local memory. The stored values can not be modified but can be read later.let
variable is called, it directly returns its value; unlike a memory
variable which returns its address. // The rightmost variable collects the newest item in the stack. So x = 3 and y = 4.
// From now on, x will be replaced with 3 and y will be replaced with 4.
3 4 peek x y in
x y * x + y -
end
// Stack = { 3, 4, 10 }
peek
works very similar to let
, except it does not pop the items from the stack. This allows it to directly load them from the stack instead of using the local memory.peek
variable is called, it directly returns its value; just like let
.include "core/memory.corth"
// This is a global variable.
// The size must be a compile-time constant as memory is allocated in compile-time.
memory count sizeof(int) end
proc increase -> in
// Reads the value of 'count', adds one and writes back.
count @64 inc count !64
end
proc main
int int -> int
in let argc argv in
// Set the value of 'count' to 0.
0 count !64
// Increase the value of 'count'.
increase
// Print the value stored at 'count'. Should print '1'.
count @64 putu putnl
memory x sizeof(int) in
memory y sizeof(int) in // Size of the variable 'x' is equal to the size of an integer (8 bytes). Same with variable 'y'.
0 x !64
x @64 puti " is before saving 'x'\n" puts
420 x !64
x @64 puti " is after saving 'x'\n" puts
0 y !64
y @64 puti " is before saving 'y'\n" puts
69 y !64
y @64 puti " is after saving 'y'\n" puts
end end
end 0 end
memory
is used for allocating global or local memory in compile-time.memory
keyword is called; the address of the variable is returned.memory
allocates memory that can be used from everywhere after definition.memory
allocates memory that can be used only in between in
and end
.@64
loads 8 bytes of data from the address. (@8
, @16
and @32
are also allowed)!64
stores 8 bytes of data to the address.include "dynamic/malloc.corth"
// Assume this is inside a procedure.
100 malloc let buffer in // Allocate 100 bytes of memory and save the object pointer as a constant named `buffer`.
// Loop through every byte in `buffer`.
buffer while dup buffer 100 + < do peek address in
0x67 address !8 // Set the byte to 0x67.
end inc end drop
buffer mfree drop // Deallocate the space.
end
malloc
or djoin
. Please check the library to learn more.