alexschneider / teascript

5 stars 2 forks source link

Scanner trims the last newline #58

Open alexschneider opened 9 years ago

alexschneider commented 9 years ago

The last newline in the scanner doesn't really convey any information to the parser or program - and we have to specially handle it in multiple locations in the parser to ensure that a program with and without a newline at the end get parsed properly. Is there any issue with trimming the last token from the scan tokens if it's a newline?

@rachelriv specifically because you worked on the scanner

rachelriv commented 9 years ago

According to our grammar, a block is a sequence of statements followed by newlines (and optionally a return statement at the end).

Program ::= Block
Block   ::= (Stmt newline)* (ReturnStmt newline)?

If we remove the final newline token, then the final statement doesn't fit our grammar.

alexschneider commented 9 years ago

So what about adding a newline prior to the end of the file if it doesn't exist? That way we can assume it exists.

rachelriv commented 9 years ago

Why would we do that?

alexschneider commented 9 years ago

The alternative is just not at all parsing files that look like this:

if exp:
  xyz
end<EOF token>
rachelriv commented 9 years ago

Well we are the ones adding in the EOF token. I'm really not sure what you are getting at.

alexschneider commented 9 years ago

Some files don't end with a newline - there's an implicit EOF token put in the files so we know where the file ends (by the operating system). Though it's best practice, not everyone has newlines before the end of file.

rachelriv commented 9 years ago

I understand what you are saying now! Thanks for the explanation.

If you can think of an elegant way to fix this, go ahead and implement it and submit a PR. However, I think this issue should be low on our priority list. I'd really like to get some more tests and a fully working parser first!

rtoal commented 9 years ago

Because your scripts are just lines of code the need for the classic EOF token isn't really there. For files that are bracketed with, say "program" and "end" (like Pascal) or that are allowed only one class, say, the EOF is important to ensure there is no additional source after the single syntactic structure allowed in the compilation unit. I believe in your case that emitting a newline when you hit the end of your stream will suffice. It would be a shame to put newline | eof everywhere in your grammar.

This is a great issue. Good find, Alex. Agree with Rachel that it can be postponed a bit. It emitting a newline at the end of file works for you, though, you can do it sooner.