AlphabetsAlphabets / Moon

The moon programing language. Moved to codeberg.
https://codeberg.org/AlphabetsAlphabets/Moon
1 stars 0 forks source link

Problems with tokens #4

Closed AlphabetsAlphabets closed 8 months ago

AlphabetsAlphabets commented 9 months ago

When creating tokens in the scan function while within the the for loop that calls identify_token the lexemes are as expected. However, once it exits the for loop, all lexemes are converted to the * character.

AlphabetsAlphabets commented 9 months ago

The main issue with the overwriting is because the previous implementation reads dead memory and it just so happens to have the * character in it. A way to fix this is with dynamic allocation to keep the characters alive.

455496265c2a08f2447573f242e7145696123060 provides simplifications to make the code easier to debug. I attempted to make some changes but the issues are still there.

https://github.com/AlphabetsAlphabets/Moon/blob/455496265c2a08f2447573f242e7145696123060/src/scanner.c#L48

Is how each character is stored.

https://github.com/AlphabetsAlphabets/Moon/blob/455496265c2a08f2447573f242e7145696123060/src/scanner.c#L113

Is how each is used to store in the lexeme and literal field in Token. But each iteration ch is replaced with a new character so lexeme is whatever the last character in the source is.

AlphabetsAlphabets commented 8 months ago

f290220 fixes this issue and introduces three main changes.

https://github.com/AlphabetsAlphabets/Moon/blob/f29022099102adf6600867efb80b7c277f4fea62/src/scanner.c#L19-L31

By moving token = scanner->tokens[i]; from the bottom of the loop to the top, it prevents printing out the same information twice. This is because token is made using token = *scanner->tokens which is already the first element in the array.

The second fix is this change.

scanner->tokens = malloc(sizeof(Token) * (length_of_src + 1))

Makes more sense than

scanner->tokens = malloc(sizeof(Token) * length_of_src + 1)

Because the first is allocating enough space for length_of_src + 1 tokens. The second is allocating enough space for length_of_src tokens and 1 more byte of space. Which is why the memory error that I kept getting is that it ran out of space.

The third change is this

From NUM_TOKENS + 1 to NUM_TOKENS++. I have no idea why this matters. Even if I implement the changes from the 1st and 2nd fix, I'll still get a memory error error if I use NUM_TOKENS + 1 instead of NUM_TOKENS++.