Akuli / jou

Yet another programming language
MIT License
11 stars 4 forks source link

self-hosted compiler?!!?!?? #116

Open Akuli opened 1 year ago

Akuli commented 1 year ago

i.e. a Jou compiler that compiles Jou code, and of course can compile itself.

progress:

For now, the main way to develop the self-hosted compiler is running the ./compare_compilers.sh script (previously named tokenizers.sh). It attempts to compile files with both compilers and compares the results. There are lists of various "known not working" files in self_hosted/, and ./compare_compilers.sh --fix updates them automatically.

Akuli commented 1 year ago

I have now written a tokenizer in Jou that is capable of tokenizing the hello world program. Here's how to run it (with the latest jou repository cloned, self_hosted isn't included in the Windows zip files that jou --update uses):

jou -o asd.exe -O1 self_hosted/tokenizer.jou
./asd.exe examples/hello.jou 

Output:

===== Tokens for file "examples/hello.jou" =====

Line 1:
  keyword "from"
  string "stdlib/io.jou"
  keyword "import"
  name "puts"
  newline token (next line has 0 spaces of indentation)

Line 3:
  keyword "def"
  name "main"
  operator '('
  operator ')'
  operator '->'
  keyword "int"
  operator ':'
  newline token (next line has 4 spaces of indentation)

Line 4:
  indent (+4 spaces)

Line 5:
  name "puts"
  operator '('
  string "Hello World"
  operator ')'
  newline token (next line has 4 spaces of indentation)

Line 6:
  keyword "return"
  integer 0
  newline token (next line has 0 spaces of indentation)

Line 7:
  dedent (-4 spaces)
  end of file

You can also get the exactly same output with the original tokenizer written in C:

jou --tokenize-only examples/hello.jou

However, it does not tokenize all test files correctly. The next step would be to make the new tokenizer tokenize all files in exactly the same way as the old tokenizer.

@littlewhitecloud Are you interested in working on the tokenizer? It should be pretty easy to finish it from here, because you can look at the C code in src/tokenize.c or the newly created doc/syntax-spec.md. As I mentioned, the goal is to make it behave exactly like the existing tokenizer written in C.

I made a script tokenizers.sh that attempts to tokenize all Jou files with both tokenizers and checks whether they produce the same output or something different. To run it, you need to:

  1. If you are on Windows, install Git if you haven't installed it yet and open Git Bash.
  2. Clone the repository: git clone https://github.com/Akuli/jou
  3. Go to the cloned repository: cd jou
  4. Ensure that the jou directory from git clone contains the Jou executable (jou.exe on Windows, jou on Linux). On Windows you can copy jou.exe from the latest zip file. On Linux you can run make.
  5. Run ./tokenizers.sh
littlewhitecloud commented 1 year ago

How to work on the tokenizer?

Akuli commented 1 year ago

My workflow is typically:

Akuli commented 1 year ago

Also, tokenizers.sh was renamed to compare_compilers.sh at some point. Sorry about the confusion :)

littlewhitecloud commented 1 year ago

Ok

littlewhitecloud commented 1 year ago

So self-compiler is just the jou compiler made with jou?

littlewhitecloud commented 1 year ago

Maybe we can write a translator C to jou and we translator the compiler made in C and we can compiler the jou code that after conversion. Then we will get a compiler made in jou and as same as C compiler.

Akuli commented 1 year ago

Yes, the self-hosted compiler is just the compiler made with Jou.

I have thought about auto-translating the C code, but I have discovered a lot of unclear error messages and compiler bugs when translating manually. I think it's a good test of how nice the compiler is to use.

Moosems commented 6 months ago

Any new progress on this?

Akuli commented 6 months ago

Not much. I haven't poured much time into this recently, as I am more focused on advent of code.