elixir-editors / elixir-sublime-syntax

The most powerful Elixir for the most Sublime experience.
MIT License
46 stars 6 forks source link

HTML (EEx/HEEx/Surface): Refactor syntaxes #33

Closed deathaxe closed 2 years ago

deathaxe commented 3 years ago

The recent syntax changes look very good and have a nice and clean structure.

Here are some ideas and proposals for HTML (EEx).

What needs consideration is embedding code in strings/attribute values needs slightly different handling than embedded code in other locations of HTML. The reason is the string scope which needs to be cleared so embedded code gets correctly highlighted and auto completion is enabled. On the other hand we don't want to clear any scopes if code is embedded outside of strings as this would very likely clear the main scope of the syntax. That's not desirable.

To achieve that two types of "embedded" contexts are required. There's no real guideline at the moment, but we may probably call normally embedded contexts "embedded" and those in strings "interpolation". We might also use interpolation vs. string-interpolation or something along those lines.

I haven't had a look at the other syntaxes of this packages, but I personally find it a bit strange to call <% a tag. With regards to ASP/JSP/PHP I'd suggest to scope all of <% and %> punctuation.section.embedded.

Some more comments about this can be found in commit messages.

Disclaimer: You asked for comments ;-)

princemaple commented 3 years ago

Your knowledge and help is greatly appreciated.

azizk commented 3 years ago

Thanks a lot for your time and putting everything in a PR! That's awesome!

princemaple commented 3 years ago

Are we happy to merge this one?

azizk commented 3 years ago

Certainly, but let me check a bit nevertheless. :) I saw that the vertical order of the test lines is not consistent with the way I did it in every other test. We should change that to be consistent I think.

princemaple commented 3 years ago

Yup. Was both getting your confirmation and making sure @deathaxe has not more to add at this stage. 😄

deathaxe commented 3 years ago

We should change that to be consistent I think.

I can change that back, if that's your style. Wasn't aware of it, tbh.

Would like to include some changes to Surface as well as

  1. it can re-use some existing contexts from EEx/HEEx
  2. found some scope inconsistencies with regards to embedded Elixir source code, compared to EEx/HEEx

I am currently a bit lost at the question of how to correctly name those embedded stuff in various situations.

Main question is meta.embedded vs. meta.interpolation. This question is not limited to this PR but more general.

ASP/PHP use meta.embedded source.xyz.embedded.html to express ASP/PHP sources within <?...?> or <%...%> sections.

During recent ST4 dev cycle we added several string interpolation features to improve highlighting and auto-completions behavior (ST disables auto-completion and various other stuff in strings). So we end up in something like meta.string meta.interpolation for source code interpolated into strings. I am uncertain whether that's appropriate for those <%...%> sections. Especially as it causes a mix of punctuation scopes.

Another question arises with a look at HEEx and Surface, which support those {...} interpolations. Do they have the same meaning as <%...%> sections but with different syntax, or is distinction needed scope-wise?

Finding a consistent solution for all syntaxes might take another roundtrip, IMHO.

deathaxe commented 3 years ago

Is something like <ta{@x}g /> valid in HEEx / Surface? I saw, <{@x}> not to be in Suface, so I guess such interpolations must not appear in tag names at all?

deathaxe commented 3 years ago

Should be done, except noted aspects, I'd like to read some input from elixir specialists.

  1. If html tags can't contain elixir interpolation those tag-other-name contexts can be removed from HEEx/Surface
  2. meta.embedded vs. meta.interpolation might be worth some thoughts. Anyway, punctuation should be consistent with choosen meta scope. So punctuation.section.embedded vs. punctuation.section.interpolation.
azizk commented 3 years ago

Is something like <ta{@x}g /> valid in HEEx / Surface? I saw, <{@x}> not to be in Suface, so I guess such interpolations must not appear in tag names at all?

I just checked with Surface. It considers all those characters to be part of the tag. So no interpolation. My guess is HEEx does the same. Btw, I tried something like ~F"<a {=@x}g />". It parses to ["<a", " g></a>"]. The trailing g becomes an independent attribute.

azizk commented 3 years ago

Cool! Will review everything. Thanks for making it clear with great commit messages and a clean history! There's also definitely some stuff to think about.

I didn't know that auto-completion doesn't work when it's embedded inside a string unless the scope is cleared. I never noticed anyway, because I configured ST to always show me suggestions in any scope...

deathaxe commented 3 years ago

Sure. I've mainly gone through formal stuff without deep knowedge about the syntax itself. It may still contain some bugs or I may have broken something.

Another major aspect with clearing string scope is correct highlighting of otherwise unmatched tokens in embedded source. They would be highlighted as string without clearing.


I just saw some tests like <{}}>{@x}</{}}>. Are they really accepted as tags? According to whatwg a tag name must always start with an ascii letter, so if no interpolation is supported, they would be invalid, IMHO.

azizk commented 3 years ago

I just saw some tests like <{}}>{@x}</{}}>. Are they really accepted as tags? According to whatwg a tag name must always start with an ascii letter, so if no interpolation is supported, they would be invalid, IMHO.

Yep, they're "accepted". I think the parser just checks if everything is syntactically correct, not so much if it makes semantic sense.

azizk commented 3 years ago

Thanks for sorting the tests and removing the tag interpolations! Question regarding atomic groups: should we really remove them now because I think they're pretty useful and maybe ST regexps will support them in the future? I haven't written a regexp engine yet but atomic groups seems like a trivial feature to me. Besides, all the other syntax files use them too and I remember in some cases they're necessary...

deathaxe commented 3 years ago

Appears I missed to tweak some tests, again.

It's very likely ST's sregex engine won't support atomic groups anytime soon. A modern syntax should avoid anything which causes Syntax Tests - Regex Compatibility command to generate any warnings to make sure to avoid Oniguruma to be triggered.

Those advanced features may have been needed in tmLanguage syntaxes and therefore still exist after those have been converted to sublime-syntax format, but with all the context magic at hand in sublime-syntax, patterns are often simple enough to not need all those sophisticated regex features. One reason to rewrite those converted syntaxes from scratch.

deathaxe commented 3 years ago

The recent commit adds some changes to illustrate atomic groups can easily be replaced by simpler patterns in various situations.

azizk commented 3 years ago

Hey, I feel a bit bad about it but I squashed most commits and re-based onto master. I consolidated (almost) all the commit messages. Hope that's alright!

I made only superficial content changes:

  1. Added you as an author.
  2. Added the "-pop" suffix to some contexts. I prefer to name them like that to signal that they change the stack.
  3. I changed some plural names to singular where it made more sense to me.

I pushed to deathaxe/proposals. You can push force if you think it's good. :+1:

deathaxe commented 3 years ago

You don't need to feel bad at all. It's your project. I am here to offer some proposals, only.

The other way round I have mixed feelings as I introduced so many changes to the already very good syntax definitions. 😄

azizk commented 3 years ago

There are lots of great changes and we appreciate it! You managed to delete a lot of unneeded lines as well. I always love that about refactoring. :smile:

I saw there's a possible optimization in surface-tag-name-common-pop and heex-tag-name-common-pop:

- match: '[[:lower:]_]\w*[?!]?|(?!\.)'
  scope: variable.function.surface
  pop: 1
- match: \.
  scope: punctuation.accessor.dot.surface

What do you think? Is it better to leave it be?

deathaxe commented 3 years ago

In which way is it an optimization? ST compiles those patterns into one optimized regexp anyway and immediately pops if none of the consuming patterns match. Can you prove this negative lookahead has any positive effect on performance or something like that? I always found negative lookaheads to slow things down, so try to avoid them. That's also what a benchmark against 5k lines of the following snippet reveals.

<MyApp.User.tag name={@name}>

Your solution is a bit slower.

The current solution requires 47ms on my box, while yours takes 50ms.

azizk commented 3 years ago

Cool, thanks for going to the trouble of benchmarking it! It was an assumption I made that we'd save some CPU cycles if we removed the immediate-pop rule. I thought ST would have one fewer regexp/rule to check. I stand corrected. :+1:

azizk commented 3 years ago

Found something: we should probably change punctuation.definition.tag.begin.surface to punctuation.section.embedded.begin.surface (and end respectively). Would be consistent with (H)EEx. Right?

azizk commented 3 years ago

I squashed your recent changes into the corresponding commits. I found something else and added two commits on top of our branch. I think we should make the Elixir code sections in Surface, e.g. {elixir_code}, "embedded" instead of "interpolated". Seems consistent with the way it's in HEEx. WDYT?

deathaxe commented 3 years ago

Maybe those meta.embedded vs. meta.interpolation should be consistent between syntaxes. So changing it in Surface would result in HEEx to be updated as well? Maybe all such elixir code sections should be treated as embedded while leaving interpolation to normal string interpolation, if supported by the language.

deathaxe commented 3 years ago

https://github.com/princemaple/elixir-sublime-syntax/blob/140892821b816c7eede51b7d65df873382b077d1/syntaxes/HTML%20(Surface).sublime-syntax#L232

The set statement should be a push as otherwise the closing } would try to pop the main context off stack.

deathaxe commented 3 years ago

Maybe should think about the final scopes as well. Not sure if something like punctuation.section.embedded.heex is desirable in a surface syntax.

azizk commented 3 years ago

I consider Surface to be sufficiently different from HEEx, so I don't think we should have heex or eex scopes in Surface. On the other hand, HEEx and EEx are very similar and it's good if they share as much as possible. Afaics that's the case now.

Maybe all such elixir code sections should be treated as embedded while leaving interpolation to normal string interpolation, if supported by the language.

Do you mean <tag attr={...}> vs. <tag attr="xxx {...} xxx">? Maybe the former should be an embedded section instead, right? I'm not sure. It seems fine to leave it be an interpolation.

Question regarding HTML comments in Surface, e.g.: <!-- This {comment} will be sent to the browser --> Shouldn't the comment scope be cleared in the interpolation so that auto-completion works?

deathaxe commented 3 years ago

I don't think we should have heex or eex scopes in Surface

Agree, this also makes naming meta.embedded.<language> easier as it looks odd to have heex in a surface file.

Do you mean ...

Yes. The main question is, what's the difference between embedded and interpolation in general. Basically a template file such as php/jsp/asp/ex/heex/surface/.... is just a big string, which is processed by an interpreter. The embedded code just expands the template string to an output format. That's exactly the definition of string interpolation as well. The point is from the view of the template processor there's no difference between the embedded code appearing within a quoted string (attribute value) or somewhere else. So the question is why scope it different?

That's a quite general question, which may effect several syntaxes.

I've tried to summerize the issue in an RFC at https://github.com/sublimehq/Packages/issues/3051

Question regarding HTML comments in Surface

You are right, in general it would make sense to clear the comment scope. Unfortunatelly there's no meta.comment at the moment (the counter part of meta.string). On the other hand some syntaxes (e.g.: Brackets) use the comment scope to dimm highlighting - not consistent with string scopes, I know.

That's just a part which hasn't been discussed so far.

I've introduced meta.comment in Perl to be able to clear comment scope in embedded code blocks, but that's the only syntax which makes use of it so far. Maybe worth another RFC?

princemaple commented 3 years ago

Is this ready to go in? @deathaxe @azizk

azizk commented 3 years ago

@princemaple Not yet but almost! Was thinking about the meta-scope stuff. Will write again in a bit...

princemaple commented 3 years ago

Hi @deathaxe and @azizk ,

I moved this repo into an org and sent you two invitations.

I enjoy coding in Elixir and don't see my self leaving Elixir anytime soon. However, you two obviously have more knowledge on Sublime syntax than I do, and use more Elixir features than I do. (I'm not currently onto Heex or Surface yet) It only makes sense to give you two more access to the repo.

That said, feel free to reject if you don't need another org/repo to worry about. 😄

Thanks for the great works.

Po

azizk commented 3 years ago

Thanks @princemaple! I accept, of course. In the near future I might suggest to José to take it under his elixir-lang umbrella. What do you think?

princemaple commented 3 years ago

In the near future I might suggest to José to take it under his elixir-lang umbrella

I mean, no objections from me, but it's probably not going to happen TBH. The language doesn't have to care about specific editors.

azizk commented 3 years ago

Yeah, let's see. It's worth a try. :)

So one last off-topic question: Will you continue handling the package.io releases, at least for some time?

princemaple commented 3 years ago

https://github.com/elixir-editors this is where it will more likely end up at.

I'm happy to indefinitely maintain the project. Releasing a new version is as simple as tagging a commit with correct format. The point is, you (and deathaxe if accepted) will be able to merge PR at your own pace. And make release when you see fit. I'm not sure if collaborators on personal repos have these access.

azizk commented 3 years ago

Oh, yes, I actually meant elixir-editors instead of elixir-lang. Guess I was confused by tree-sitter-elixir which was newly added to elixir-lang.

Ok, thanks a lot @princemaple! I accepted the invitation. It's cool if you accept as well @deathaxe, but you don't have to be active really. Maybe you can function as a backup maintainer if you like. Of course, no problem if you decline. :+1:

deathaxe commented 3 years ago

I probably won't have much to add. I am busy enough with core syntaxes, MarkdownEditing and some other repos. You are welcome to ask questions and I can add some PRs, but I don't see me maintaining this repo anytime soon. I don't use or know Elixir at all. My motivation for this PR is to help with applicating core HTML syntax using inheritance.

princemaple commented 3 years ago

Huh, that's very impressive. I thought you'd be using it somehow if you had been thinking about its syntax.

You are welcome to ask questions and I can add some PRs, but I don't see me maintaining this repo anytime soon.

Noted and thanks! Appreciate your work on core.

azizk commented 3 years ago

Totally understand, @deathaxe!

Alright, back to the PR: I'm not sure about the distinctions between interpolation and embedded. I read your RFC ticket #3051. It's a great summary of the issue. It seems to be a little dilemma without an obvious solution. Maybe it's best to keep it simple and make everything embedded when inside a template file and use interpolation strictly for Python f-strings or Elixir string interpolations and such.

Regarding the meta.embedded.* scopes: it's good to have different ones for (H)EEx and Surface, just the way you did it a few days ago. I would suggest not to complicate things and use source.elixir.embedded.html in all template syntax files. It makes styling those sections easier and it reads better because you can say it's elixir inside html. I don't think we need to be that specific and say it's elixir inside eex or surface.

Regarding meta.comment: I will add it to the syntax file. I think it makes a lot of sense.

I will make some changes on my branch so we can compare and then finalize the PR. :+1:

deathaxe commented 3 years ago

... use source.elixir.embedded.html in all template syntax files ...

I can follow your arguments and would be fine to use that general scope.

The main reason for finalizing the scope with eex or surface is the main scope being html.[eex|surface]. So basically eex and surface are specialized forms of a html template. As those embedded code blocks are introduced by this special variant those might deserve its final scope.

Another aspect not yet discussed is supporting embedded elixir code blocks in js/css. The current solution would allow to just include those contexts in an extended CSS/JS syntax. You can have a look at Java Server Pages in https://github.com/sublimehq/Packages/pull/2654 to learn how it would work.

Finalizing the embedded code's scope with HTML would mean to do so with css/js in those syntaxes for consistency reasons, which ends up in duplicating all those contexts. That's the approach which is taken at https://github.com/sublimehq/Packages/pull/2789.

TBH, I am uncertain about which one is the better solution. Both are correct.

If the source.elixir.embedded.html scope is used, the related meta.embedded and punctuation scopes should probably be finalized with html as well.

azizk commented 3 years ago

Oh, I see! There's a bit more to consider then. Let me think about it. :slightly_smiling_face:

azizk commented 3 years ago

Finally got around to working on this again and made good progress. I noticed that we made some fundamental syntax mistakes. Embedding code isn't supported for attribute names and inside string values like this:

<div {@some_attr}="value" style="{@value}" />

Besides that, I decided to simplify the scope naming issue and to just use source.embedded.elixir.html in all related syntax files.

Q: In the Surface file in the tag-generic-attribute-value context I couldn't figure out why it needs the else-pop even though meta_prepend is true and in the HEEx file it works without it.

I think we're almost done. Maybe a final review round? :)

deathaxe commented 3 years ago

I couldn't figure out why it needs the else-pop

My most recent local copy of what I've pushed upstream doesn't contain tag-generic-attribute-value nor does it contain else-pop in it.

It's something you must have added. Two mistakes:

  1. else-pop needs to be removed.
  2. tag-generic-attribute-value must be replaced by tag-generic-attribute-value-content

If embedding elixir code in attributes is not supported the content is probably not needed to be extended at all?

deathaxe commented 3 years ago

Embedding code isn't supported for attribute names and inside string values

Does it apply to both HEEx and Surface?

azizk commented 3 years ago

My most recent local copy of what I've pushed upstream doesn't contain tag-generic-attribute-value nor does it contain else-pop in it.

Ah, sorry, I didn't mention it. I force pushed to our deathaxe/proposals branch.

Does it apply to both HEEx and Surface?

Yes. So tag-generic-attribute-value-content isn't needed in either syntax file. I checked the source code of Surface and tested HEEx in the console to make sure. I wonder why I thought this was supported. Maybe earlier versions had a bug?

I managed to fix the else-pop issue. Just needed an elixir-embedded-pop context.

Another issue I encountered is that attributes like id, class etc. are handled specially. I just made a simple rule to check whether an attribute name is followed by a ={ to know whether it's an embed value or not. The disadvantage is that the attributes don't get the special scope names.

princemaple commented 3 years ago

FYI it's best not to force push a branch that multiple people are working on. Also prefer --force-with-lease so you avoid overwriting someone else's work.

If the purpose was to organize the commits, it's best done at the very end when you want to merge it in. And it's safer to create a new branch on top of it before you squash.

azizk commented 3 years ago

I've always been using --force-with-lease. :+1:

Yeah, the way I handled this here wasn't optimal. I should have pushed onto deathaxe's branch to keep the changes visible for some time.

deathaxe commented 3 years ago

But you make it impossible to track changes by this kind of abusing git. That's not the purpose on version control!

Please don't force push to my branch making it look broken by me!

deathaxe commented 3 years ago

I just made a simple rule to check whether an attribute name is followed by a ={ to know whether it's an embed value or not.

Why bothering if embedding is not supported in attribute values?

deathaxe commented 3 years ago

I checked the source code of Surface and tested HEEx in the console to make sure. I wonder why I thought this was supported.

To be sure to understand correctly I just checked https://surface-ui.org/template_syntax.

The very first example shows embedded elixir code in html tag attributes. So why do you think it is not supported?

grafik

The following is status quo of this branch:

Surface:

grafik

grafik

HEEx:

grafik

grafik

EEx:

grafik

azizk commented 3 years ago

But you make it impossible to track changes by this kind of abusing git. That's not the purpose on version control!

I know, I know. It wasn't the best thing to do, but I don't think I abused it. I only ever worked on my branch. You can still compare changes by diffing your local branch with my remote branch, or by selecting the top commits in Sublime Merge.

Please don't force push to my branch making it look broken by me!

Never pushed let alone force pushed anything to your branch. I only did it on my branch because I thought we'd be finished soon and only a ff-merge would remain to be done. Next time I'll be more careful, create normal commits and discuss squashing beforehand when necessary. :)

Why bothering if embedding is not supported in attribute values?

Embedding is supported as values but not inside the strings. In my example above, the embedding is inside the string. That wasn't supported or isn't anymore.

<!-- Invalid: -->
<div {@some_attr}="value" style="{@value}" />
<!-- Valid: -->
<div {[class: "x"]} style={@style} />