atom / language-yaml

YAML package for Atom
Other
24 stars 39 forks source link

Hash sign breaks highlighting #114

Open Nixinova opened 3 years ago

Nixinova commented 3 years ago

Removing most of the checklist as I'm not using atom, I'm using the grammar file directly.

Description

The syntax grammar does not correctly tokenise properties with hashes inside them:

---
C#:
  programming: true
Language:
  name: C#

This syntax is valid and should be shown as such.

Nixinova commented 3 years ago

The issue is the [^...#...] and [^#]*? bits here (and in similar constructs):

https://github.com/atom/language-yaml/blob/821c7f157738ee74954c1c48b17acd492eb963d5/grammars/yaml.cson#L132

50Wliu commented 3 years ago

It's been quite a while since I touched these regexes but looking at the line you linked, we're not being strict enough about hitting a comment. We break as soon as a # is found, even though the spec says "Comments must be separated from other tokens by white space characters". Maybe something like this would work: ([^!{@#%&*>,\'"](?:[^#]|(?<!\\s#)*?) Maybe. I forget if the lookbehind would behave as expected there.

Nixinova commented 3 years ago

I've messed around with a regex for this and like the many other times I've tried to figure out how to negate multiple characters, but can't figure out how to do it.

It would probably be more easier and accurate to have a comment token overwrite all the other tokens.

50Wliu commented 3 years ago

So I realized that negating multiple characters is very hard, but explicitly capturing the "space with everything except a # after it" is easier. ([^!{@#%&*>,\'"](?>\s[^#]|\S)*\s*) Is that better? If not then yeah we might have to settle for what you proposed.

Nixinova commented 3 years ago

That seems like it could work

50Wliu commented 3 years ago

@Nixinova would you mind creating a PR? If not I can go and setup my environment again and do it :).