ScintillaOrg / lexilla

A library of language lexers for use with Scintilla
https://www.scintilla.org/Lexilla.html
Other
186 stars 67 forks source link

Add Nix lexer #282

Open techee opened 1 month ago

techee commented 1 month ago

This PR adds the Nix lexer which I based on the recenly merged Dart lexer. Some of the language details can be found in

https://nix.dev/manual/nix/2.24/language/syntax

and related documents.

Some notes about the implementation:

  1. For key detection (key = value;) I used the same lookahead technique as used in the JSON lexer which tries to find = after the identifier.
  2. I used something similar for path literals (https://nix.dev/manual/nix/2.24/language/syntax#path-literal) which are kind of vaguely defined and based on the fact that there is / somewhere inside the literal. It didn't seem to be a problem for sources I used for testing but searching for / in every identifier might cause performance issues - if it is a problem, path literal highlighting could be performed only for <paths_in_braces>.

There's one issue I'm aware of - folding of sources like

{
  # stuff
} ''
multi line string
''

where the end of the block } is on the same line like the start of multiline string ''. This style is used e.g. in

https://github.com/NixOS/nixpkgs/blob/master/pkgs/build-support/docker/default.nix

and I added an example of this at the end of the unit test. What happens is that fold level is unchanged on the } '' line because it is decreased and increased at the same time so the result is that folding starts at the initial { and ends at the final '' which is strange. It would be OK if it were just a combination of the same blocks like

{
} {
}

but it's strange when it's block+string. Do any other lexers have to solve this issue and what is the recommended approach here? One way would be to completely disable folding of multiline strings or just accept this behavior if there's no better option.

Fixes #116.

rdipardo commented 1 month ago

Do any other lexers have to solve this issue and what is the recommended approach here?

The Lua lexer appears to manage it, probably because multi line strings preserve line state.

scite-552-lua-folded
techee commented 1 month ago

The Lua lexer appears to manage it, probably because multi line strings preserve line state.

No, it suffers exactly from the same problem I was talking about - the fold point in your screenshot starts at { and ends at ]] - what I would expect is that there are 2 fold points

  1. One for { }
  2. The other for [[ ]]

and not one big fold point for { ]].

In any case, good to know others do it wrong too ;-)

techee commented 1 month ago

Oh and by the way, how did you get that line state information to the left margin?

nyamatongwe commented 1 month ago

but it's strange when it's block+string. Do any other lexers have to solve this issue and what is the recommended approach here?

Lexers just accept this. It is similar to folding an if/else structure in 2 places which is a common idiom that uses the fold.at.else property. This was initially implemented in the cpp lexer so that should be examined.

if () {
    ;
} else { // Fold here?
    ;
}

This is why many lexers store 2 fold level numbers per line: a level to base folding on and the start level for the next line in high bits that are ignored for fold interaction.

int lev = levelUse | levelNext << 16;

Oh and by the way, how did you get that line state information to the left margin?

SCI_SETFOLDFLAGS(SC_FOLDFLAG_LEVELNUMBERS)
// or, in SciTE properties
fold.flags=64
techee commented 1 month ago

Lexers just accept this.

OK, so if it's not a big problem, I'd just leave it as it is.