gilch / hissp

It's Python with a Lissp.
https://gitter.im/hissp-lang/community
Apache License 2.0
364 stars 9 forks source link

Consider inject literals in Lissp (raw symbols) #209

Closed gilch closed 8 months ago

gilch commented 1 year ago

I'd use the |...| symbol syntax from Common Lisp. They'd basically act like symbols that don't munge. These would parse to a str object in the Hissp. You can already do this with .#"...", but it would have two fewer characters overhead, and you'd only need to escape the less common (in Python code) | character instead of ".

A number of macros need injected strings now and could use these instead. Does X#.#"0 == X & 1" or X#|0 == X & 1| look better? I prefer the latter. The other lambda readers would be similar.

I've also used injects in the case macro. Any fully-qualified function name can be used as a reader macro. If those need a string argument, you'd have to inject it. E.g., does fractions..Fraction#.#"1/2" or fractions..Fraction#|1/2| look better? Again, I prefer the latter.

This would be similar to Hebigo's bracketed expressions, and (of course) it's similarly easy in readerless mode. Injections are opaque to code-walking macros and shouldn't be overused.

Quoting a string literal results in something like "('foo')", even if it's nested in a tuple. This is surprising and annoying to users used to other Lisps who just wanted data. Python simply doesn't have a separate symbol type, and pervasively uses strings everywhere it needs to represent identifiers, so it's not like making a new type would help much even if I were willing to give up the standalone property, which I'm not. Strings have to be used for both. The |...| atoms would be an easy alternative that would still make sense to a Common Lisp user. They would also work as strings when quoted directly, like '|...|. Python users are used to having two different quotation types and will often pick the one that requires the fewest escapes.

The inject macro .# wouldn't simply go away, since it can be used on any data type. This would leave us with several kinds of strings to keep track of |...|, '|...|, .#"...", #"...", and "...". We can mostly ignore the .#"...", since |...| would almost always be used instead. But the #"..."/"..." distinction makes less sense now. I might make the '|...| form the raw string syntax (which seems pretty natural, except perhaps for \|), make "..." the one that processes Python escapes, and get rid of the #"..." atoms altogether.

I've designed Lissp to be compatible with editors for Common Lisp. If anything, compatibility would improve there as we'd be more careful with | characters. Emacs Lisp lacks this syntax, so there may be a loss there, but I don't know why anyone would edit Emacs Lisp in anything but Emacs, which already has lisp-mode, so maybe this isn't an issue.

There is one dealbreaker. Parinfer can't fully handle these yet. https://github.com/parinfer/parinfer.js/issues/209 https://github.com/oakmac/parinfer/issues/16 If that changes, I might just do it.

gilch commented 1 year ago

Something like X#|0 == X & 1| apparently isn't compatible with lisp-mode. It recognizes the #| as the start of a comment. This is surprising because sbcl parses the whole thing as a single symbol, which is what I would have expected. Adding a space between the # and | makes it parse like two symbols though. Come to think of it, I ran into this problem in my tests when using an alias qualifier for the || macro. These should have worked without the spaces. One of the main uses I see for inject literals is as arguments to reader macros, which means we'd need to add the space between them to work around a bug in lisp-mode. I wonder if I can get this fixed.

gilch commented 8 months ago

Parinfer is not ready for this, but should mostly behave when they don't contain terminating characters or they're at least escaped, which is allowed. Like Common Lisp's symbols using multiple escape characters (i.e., |), a \ can escape either a following \ or | and gets removed without doing anything elsewhere. These turn out to be trivial to implement. That means (at least until Parinfer catches up) the default style will be to eschew whitespace and escape any problem characters. I haven't enumerated which ones those are yet, but probably anything that would terminate a normal symbol in Clojure is a potential concern.

The multiple string types in the different stages is one of the hardest things about Hissp to document clearly, which means it's likely a point of confusion for users as well. Anything I can do to minimize this number is probably worth doing, and I should come up with different names I can use consistently for each case.

Having both #"" and "" strings is probably redundant given ||. '|| can be a raw string. |""| can be an escaped one, although \s would have to be double-escaped. Yeah, that's not going to work.

These would be a lot more useful if the \s were not magic and you only escaped | like ||, but Parinfer.

Can I at least switch the interpretation of #"" and ""? That would be more like Clojure. You'd almost always use "" and only occasionally use #"" for things like regex. I had it that way originally, but using a tag on a hash string looks bad. It would be even worse now with the multiary tags #245.

gilch commented 8 months ago

Is this still worth doing with Common Lisp–style escapes? Maybe. The result of '("a" "b" "c") might be surprising in Lissp. I'm currently recommending (en#tuple "a" "b" "c") or `(,"a" ,"b" ,"c") instead, although '(.#"a" .#"b" .#"c") does work. '(|a| |b| |c|) looks better, and still makes sense if you're used to older Lisps.

<<# might be good enough for raw strings. Short regex would always require a newline though. Or lots of backslashes.