gilch / hissp

It's Python with a Lissp.
https://gitter.im/hissp-lang/community
Apache License 2.0
369 stars 9 forks source link

Tokenize comment blocks #205

Closed gilch closed 1 year ago

gilch commented 1 year ago

I started another docs increment and discovered a problem that required a deeper change. This one alters Lissp's tokenizing regex. If I did this right, it should only affect comments, which are now tokenized in blocks. This enables the <<# comment-string macro to create multi-line strings without using extras, which it no longer supports.

codecov[bot] commented 1 year ago

Codecov Report

Merging #205 (fb9c4c0) into master (ebb517c) will not change coverage. The diff coverage is 100.00%.

@@            Coverage Diff            @@
##            master      #205   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            6         6           
  Lines          700       706    +6     
  Branches       111       111           
=========================================
+ Hits           700       706    +6     
Impacted Files Coverage Δ
src/hissp/reader.py 100.00% <100.00%> (ø)

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

gilch commented 1 year ago

The problem necessitating this change was multiline strings contained in comment strings interacting poorly with Parinfer, which can't handle unbalanced double quotes in comments, but can handle balanced pairs over multiple comment lines as long as they're contiguous. I really want to support Parinfer because I use it, and it makes Lisp editing feel indentation-based like Python. The current tokenizer was therefore in conflict with a major project goal and had to change.

gilch commented 1 year ago

Hmm. Doc build has some warnings and lost syntax highlighting in some of the examples. Perhaps altering the tokenization broke the highlighting parser?

gilch commented 1 year ago

I think I see the problem. You can no longer assume that a line ending in a comment is done, because the comment token might have another line. A comment isn't closed until it has a newline, just like a string isn't closed until it has a quotation mark. The highlighter is probably reflecting how the REPL actually behaves now, but some examples with trailing comments have not been updated. Moving the comments or adding a continuation prompt (as the REPL now would) should fix them.

gilch commented 1 year ago

That maybe wasn't it. It did actually break the LisspLexer, in the CommentSubLexer, which seems to have fixed everything. That also suggests that the doctests don't enforce that trailing newline after comments. It wouldn't be the only way they differ from the real REPL.