Open Akuli opened 1 year ago
I have now written a tokenizer in Jou that is capable of tokenizing the hello world program. Here's how to run it (with the latest jou repository cloned, self_hosted
isn't included in the Windows zip files that jou --update
uses):
jou -o asd.exe -O1 self_hosted/tokenizer.jou
./asd.exe examples/hello.jou
Output:
===== Tokens for file "examples/hello.jou" =====
Line 1:
keyword "from"
string "stdlib/io.jou"
keyword "import"
name "puts"
newline token (next line has 0 spaces of indentation)
Line 3:
keyword "def"
name "main"
operator '('
operator ')'
operator '->'
keyword "int"
operator ':'
newline token (next line has 4 spaces of indentation)
Line 4:
indent (+4 spaces)
Line 5:
name "puts"
operator '('
string "Hello World"
operator ')'
newline token (next line has 4 spaces of indentation)
Line 6:
keyword "return"
integer 0
newline token (next line has 0 spaces of indentation)
Line 7:
dedent (-4 spaces)
end of file
You can also get the exactly same output with the original tokenizer written in C:
jou --tokenize-only examples/hello.jou
However, it does not tokenize all test files correctly. The next step would be to make the new tokenizer tokenize all files in exactly the same way as the old tokenizer.
@littlewhitecloud Are you interested in working on the tokenizer? It should be pretty easy to finish it from here, because you can look at the C code in src/tokenize.c
or the newly created doc/syntax-spec.md
. As I mentioned, the goal is to make it behave exactly like the existing tokenizer written in C.
I made a script tokenizers.sh
that attempts to tokenize all Jou files with both tokenizers and checks whether they produce the same output or something different. To run it, you need to:
git clone https://github.com/Akuli/jou
cd jou
jou
directory from git clone
contains the Jou executable (jou.exe
on Windows, jou
on Linux). On Windows you can copy jou.exe
from the latest zip file. On Linux you can run make
../tokenizers.sh
How to work on the tokenizer?
My workflow is typically:
./compare_compilers.sh
runs with no errorsself_hosted/tokenizes_wrong.txt
../compare_compilers.sh
again. It will fail, and the error message shows you the differences between how self_hosted/tokenizer.jou
(green) and src/tokenizer.c
(red) tokenize the file.self_hosted/tokenizer.jou
and run ./compare_compilers.sh
again until it succeeds.Also, tokenizers.sh
was renamed to compare_compilers.sh
at some point. Sorry about the confusion :)
Ok
So self-compiler is just the jou compiler made with jou?
Maybe we can write a translator C to jou and we translator the compiler made in C and we can compiler the jou code that after conversion. Then we will get a compiler made in jou and as same as C compiler.
Yes, the self-hosted compiler is just the compiler made with Jou.
I have thought about auto-translating the C code, but I have discovered a lot of unclear error messages and compiler bugs when translating manually. I think it's a good test of how nice the compiler is to use.
Any new progress on this?
Not much. I haven't poured much time into this recently, as I am more focused on advent of code.
i.e. a Jou compiler that compiles Jou code, and of course can compile itself.
progress:
jou
executable thatmake
produces is the self-hosted compilerFor now, the main way to develop the self-hosted compiler is running the
./compare_compilers.sh
script (previously namedtokenizers.sh
). It attempts to compile files with both compilers and compares the results. There are lists of various "known not working" files inself_hosted/
, and./compare_compilers.sh --fix
updates them automatically.