Add the full awesomeness of Unicode to variable/rule names

eugenesvk commented 1 year ago

Allows using visually descriptive symbols to convey meaning that's very obvious like

±   = '[+-]'                        # sign regex
S±  = 'keyword.operator.arithmetic' # scope for ↑

and then have shorter, yet more readable rules

integer{#[S_M_INT]decimal}  : '(#[±])?\b(#[INT])\b' #
 {1                         :  #[S±]                ,
  2                         : #[S_CONST]integer     };# Integer literal

Or you could use side-aware quotes to signal start/end (visibility of ↓ might depend on your font)

S“  = '#[SP]string.begin'   #
S”  = '#[SP]string.end'     #

Readme is updated, includes your recommendation against using it

SBNF syntax is also updated

Closes: https://github.com/BenjaminSchaaf/sbnf/issues/33

BenjaminSchaaf commented 1 year ago

I've enabled CI for pull requests. I think you need to rebase before it'll work.

eugenesvk commented 1 year ago

For what I would describe as very little value

Yeah, that was always going to be the blocker, can't explain the value of using symbols for easy differentiation if it's not apparent

there's already been a fairly hefty maintenance burden with this: finding 2 separate bugs with more likely hiding underneath.

sure, and if it's a bug like some Braille char, that could just be left in without burdening you

As I expected this is not a simple change

Interesting, I've found the opposite, didn't think I could just copy&paste a few symbols and a couple of functions and make it work, already switched a syntax to using symbols, and it's been great

At this point the only way forward for getting more unicode characters would involve a small change that's trivially correct

That would be a huge waste of time. It's much easier to exclude a char here and there that bugs in practice rather than review every single char set against unclear set of rules

Like that Braille empty char, the Unicode standard specifically states it's not a space

• while this character is imaged as a fixed-width blank in many fonts, it does not act as a space

(and it doesn't separate words in Sublime). Now, I don't know the difference and surely not eager to find out for every char, there is simply no risk in leaving those in that would justify that kind of thorough review

FichteFoll commented 1 year ago

already switched a syntax to using symbols, and it's been great

At this point, I'm just wondering what such a syntax would look like tbh. Mind sharing it?

eugenesvk commented 1 year ago

It's not ready yet (and with a few bugs I've encoutnered may not get ready...), but here are a couple of examples:

Slightly more involved one with symbols indicating the role of rules and scopes, and also in the case of scopes significantly shorten them as they just break tabular layout otherwise

Slightly less involved with symbols indicating that the rules are identical, but only have an extra "comment" scope (and these are also the easiest to autocomplete since the starting symbols are not special)

and the follow-up rule with just an extra scope (part of string-c⁄-), which is immediately apparent as c⁄- (within a rule or within a scope) is the only thing that changes (though this is hampered a bit by the requirement to add those ugly #[] escapes)

There is no way the measly ASCII can approach anything close to that expressive power

mitranim commented 1 year ago

Looking at these examples, I feel compelled to point out that special symbols buy you little brevity (at non-zero costs). one¬two is not that much shorter than one-neg-two, but the latter is much easier for anyone to understand and type. Most identifiers in the provided examples wouldn't even double in length if converted to ASCII, using shortened words. So while it's true that special symbols can "compress" your text because there are more special symbols than letters, the degree of compression, while varying, would often be not worth it.

Alphabets have pretty good expressive power. As a strawman example, with 26 letters, just 3 letters give you 26 26 26 = 17576 possible combinations. But typing a few letters is much faster than copy-pasting a special symbol or picking from auto-complete if lucky enough to have that.

This depends on one's keyboard layout, typing habits, and so on. For different people, different approaches are more efficient or convenient. I've heard of developers who find it difficult to use a keyboard, prefer to mouse-click around, and dream of a world where some "AI" would auto-generate all the code for them. In such cases, special symbols aren't significantly different from ASCII. However, a significant number of developers (hopefully a majority?) are used to blind-typing, and when working on a syntax, would prefer to type (and read) something like -neg- over ¬.

In programming languages, many features exist to satisfy only a subset of the users. It's reasonable to say that if a significant portion of the users just really want to use special symbols, privately, then great, let them! However, in addition to internal complexity and maintenance cost, this creates the possibility that some will use this non-privately, in a collaborative environment, forcing others to deal with special symbols. And I'd rather not.

eugenesvk commented 1 year ago

Looking at these examples, I feel compelled to point out that special symbols buy you little brevity

That's mostly because you ignore the examples that buy a lot of brevity, e.g. the one I gave you in the other comment

r‹＃
raw-hashed-quote-left

(at non-zero costs).

with non-zero benefit

one¬two is not that much shorter than one-neg-two

These things compound. And don't cherry-pick short examples to address the brevity claim.

, but the latter is much easier for anyone to understand and type

except that the relevant group is not "anyone", someone more specific, who could both understand and type the symbol easier

Most identifiers in the provided examples wouldn't even double in length if converted to ASCII, using shortened words

Then there are those that would triple in length. And all of them would lose in legibility. Color your ASCII red with 🛑

degree of compression, while varying, would often be not worth it.

And then it would often be worth it

. As a strawman example, with 26 letters, just 3 letters give you 26_26_26 = 17576 possible combinations.

Good that you understand it's an unrelevant strawman (why are you using longer names if 3 is enough?)

But typing a few letters is much faster than

you forgot the most obvious other option - it's much slower than typing a symbol if you've set your keyboard apps right. For example it's much faster for me to type ≈ than approximately-equal

copy-pasting a special symbol

yes, that is slow, gladly that's not the only option

or picking from auto-complete if lucky enough to have that.

You're lucky enough to have Sublime in the context of developing a Sublime syntax

This depends on one's keyboard layout, typing habits, and so on. For different people, different approaches are more efficient or convenient.

Yet you keep arguing against the approach that is more efficient

However, a significant number of developers (hopefully a majority?) are used to blind-typing, and when working on a syntax, would prefer to type

and they can continue to do so just like before

However, in addition to internal complexity

what's the complexity of an extra match table?

this creates the possibility that some will use this non-privately, in a collaborative environment,

That's simply false, you ignore the simple fact I already mentioned to you that this is already possible. What say your majority of developers used to blind-typing ASCII to this valid SBNF syntax:

main : ( ~( oh-my-i-m-forcing-someone-to-use-chinese)  )* ;
世界你好='hello world'
oh-my-i-m-forcing-someone-to-use-chinese : 'foo'{世界你好} ;

So adding more symbols conceptually changes nothing

forcing others to deal with special symbols. And I'd rather not.

You'd rather yes - you want to force people to collaborate using only the approach you admit yourself is less efficient (for some). That's not a tenable general approach and it's also not the one that's currently implemented

BenjaminSchaaf commented 1 year ago

The SBNF syntax doesn't correctly recognize unicode characters:

äclause : a ;

BenjaminSchaaf commented 1 year ago

This is also missing U+2028, U+2029 and U+061C, which are control characters.

BenjaminSchaaf / sbnf

Add the full awesomeness of Unicode to variable/rule names #35