eteran / nedit-ng

a Qt5 port of the NEdit using modern C++14
GNU General Public License v2.0
94 stars 26 forks source link

[enhancement] update list of default language modes #190

Open tksoh opened 3 years ago

tksoh commented 3 years ago

The list of language mode supported by nedit is very dated. We should update the list to include the new/popular programming languages and utilities/scripts.

Also, perhaps a new mechanism is needed to facilitate user contributed lang modes, instead of having them coded into the core, in order to speed up adoption of new modes.

Rimpire commented 3 years ago

You can define custom languages and syntax highlight. I have created on the original nedit very complex syntax languages with nested syntax highlighting. And was able to import them successfully in nedit-ng

tksoh commented 3 years ago

the import support certainly make things earlier, but what is lacking is some kind of 'eco' system. The old nedit imports the new lang mode and patterns, and just about everything else, into the .nedit/nedit.rc, which makes it tricky to try to share with other. I saw nedit-ng already break up the rc resources into different files, in this case 'language.yaml' and 'patterns.yaml' for the language support, so I think we are starting out with the right foot.

Off hand, I feel importing via CLI is clumsy at best. Some kind of simple GUI to take in the new languages and patterns will make things a lot more straightforward. Or perhaps the language mode files can be housed in a 'languages' subdir in the config directory, then we only need to drop all user-created new lang files into that subdirectory for nedit-ng to pick them up automatically. And leave the bundled yaml files for the built-in staffs.

Same applies to the user-defined macros.

@eteran Any thought?

tksoh commented 3 years ago

BTW, with the subdirectory approach, it will also mean the user created stuffs will need to be save into the subdirectory too, naturally.

eteran commented 3 years ago

Interesting idea. We could for sure do the very unixy thing and have something like:

"if there is a languages directory, read all YAML files in it, otherwise, read the languages.yaml". I like the idea enough that I'm kinda disappointed that I didn't think of it first :-P.

This would make sharing things SO much easier and intuitive, and frankly would almost not require a UI at all. Just plot files in there and restart the app.

anjohnson commented 3 years ago

"if there is a languages directory, read all YAML files in it, otherwise, read the languages.yaml".

Can I suggest "First read the languages.yaml file, then read all the .yaml files in the languages directory" instead, which allows for backwards compatibility and for new files to be added to that directory without disabling any that the user had already defined in the single file. Redefinitions should replace the older version completely though, don't try to merge them.

There is a question about where to save edits to the language definitions; if nedit-ng automatically moves all language definitions into individual language/ files when the user does a Preferences → Save Defaults that would cause problems if someone shares their configuration files between different machines that have different versions installed. Maybe the Language Modes dialog could have a filename text-box added, and if that's empty the definitions would be saved in languages.yaml.

eteran commented 3 years ago

Yea, backward compatibility is where things get tricky. We'll have to think carefully about how it all should work.

eteran commented 3 years ago

@anjohnson @tksoh

Would it make sense for it to read the languages.yaml file first, and then, as it is reading the languages in languages/*.yaml, if a language there has the same name as one already loaded, then it would simply be used instead?

There's ... quirkiness that comes into play though.

Like if we do that, and then the user clicks save preferences, where does a given language get saved? Do we now need to track where every language came from? Do we just save everything into the original languages.yaml? If we do that, what about the languages/*.yaml, we can't just go around deleting user-created files!

It can get messy a bit quickly.

Loading is actually pretty easy to deal with, it's the saving via in the UI that makes me unsure how to handle this best.

I think that's what you were trying to get at @anjohnson .

eteran commented 3 years ago

One thought I just had, and I have very mixed feelings on...

a lot of the more "advanced editors" like atom, vscode, sublime, etc... just basically punt on this problem. Most of these editors, when you want edit the config that is stored in a file just have you use the editor to edit the file, and it's usually just some json that the user is expected to understand. They often don't even offer a UI to solve it.

It's an ugly solution because it's very "expert friendly", but also completely solves the "where to save preferences" issue because the preferences are just the file that you're editing, when you hit save, that's where it goes.

tksoh commented 3 years ago

My first reaction to the so-called "expert friendly" solution is that nedit already has one. So, that's not where we want to go, since it already does not meet what we want to accomplish, namely a no-brainer "drop-in" integration.

I am for "First read the languages.yaml file, then read all the .yaml files in the languages directory", in which languages/*.yaml override the versions in languages.yaml.

I don't think we need any UI on this. We just save each user created/customized language into an individual yaml file inside the languages/ directory like Andrew suggested. Effectively, it's just as if they are being dropped in by the users when importing new/upgraded language definition.

Technically, we don't need a language.yaml, since the default set of languages is already stored inside nedit-ng. So if a copy is found, then it should be maintained as is (i.e. no new changes will be added into it by nedit-ng), and its content be merged in-memory by nedit-ng. If the user then further modify their changes, then they will have to be saved into the languages subdir.

I however don't quite understand what Andrew means by "new files to be added to that directory without disabling any that the user had already defined in the single file." I mean, how else are we going to update a language already defined in language.yaml, be it the default version or the user-edited version? Recreating a language can be a potentially daunting task, especially for languages that are derived on the commonly available ones. Perhaps Andrew can elaborte further.

Backward compatibility is messy business ;-)

eteran commented 3 years ago

There is another "backward compatible" option, and it's a weird one :-P.

So basically nedit-ng is ALREADY backward compatible with classic nedit's language mode specification. It won't generate it on its own, but if you use nedit-import from a classic nedit configuration. it will stuff the entire language mode specification in the .ini file just like classic nedit did with its config files...

nedit-ng will load that just fine, but if you ask nedit-ng to save preferences, it actually just saves: nedit.languageModes=* where the * is a "magic value" meaning "look in the languages.yaml file instead", and then writes out the newer languages.yaml file.

So, if I explained that clearly enough to follow, we could do something similar. We could have nedit-ng be able to load from any of 3 places depending on the value of nedit.languageModes in the config file. And it could "auto-upgrade" (like it does with classic config) to a broken out format going forward. Something as simple as nedit.languageModes=+ could mean "look in languages/*.yaml"

It's a bit ugly, but it's an option, and is similar to a technique that has worked well enough that nobody noticed I did it once already ;-)

eteran commented 3 years ago

For the curious, if you look in the $HOME/.config/nedit-ng/config.ini you can see that I used the nedit.XXX=* technique to silently upgrade basically every multiline field to its own YAML file.

anjohnson commented 3 years ago

@eteran asked:

Would it make sense for it to read the languages.yaml file first, and then, as it is reading the languages in languages/*.yaml, if a language there has the same name as one already loaded, then it would simply be used instead?

My original suggestion was

"First read the languages.yaml file, then read all the .yaml files in the languages directory", which allows for backwards compatibility and for new files to be added to that directory without disabling any that the user had already defined in the single file.

which I think is exactly what you asked. The "new files" clause which @tksoh asked about was to say that users can copy any downloaded roku.yaml or whatever language file straight into that directory and it will be loaded automatically (will they need to restart Nedit-ng for that to happen?). Loading just a single languages.yaml file makes it harder to add new languages since the user has to know how to merge language rules into that file, and a naive user could lose existing rules if they make a mistake.

This sentence:

Redefinitions should replace the older version completely though, don't try to merge them.

was to stress that any language that is successfully read from a file in the languages directory and has the same name as one which was already loaded from the languages.yaml file should completely replace all aspects of the first definition. You do have to decide how to handle the case where you've already loaded the languages.yaml file successfully and there's a parsing error in a file that is redefining an already-defined language; hopefully in that case you can keep the original language definition and post a pop-up about the syntax error. Oh, and what order will you be reading the files in? That matters because the same language could appear in more than one file...

I had a suggestion for your second question about saving languages: Add a filename box to the "Language Modes" dialog box. If that's empty the language belongs in the languages.yaml file. This gives users control over when their personal language definitions get moved into the separate files, they just have to add a filename there, but it won't auto-convert en-mass (which the user might not want to happen anyway).

How you handle the settings in the .ini file is internal detail IMHO.

BTW does anyone have a set of language rules for .yaml and/or .json files? I suspect a generalized version of that question should become a Github Discussions topic, and the easier it is to share rules the better.

eteran commented 3 years ago

Oh, and what order will you be reading the files in? That matters because the same language could appear in more than one file...

A typical UNIX convention is to load files in lexigraphical order. So if you really have a need to enforce ordering you can do something like: 01-c++.yaml and 02-c.yaml.

anjohnson commented 3 years ago

Just asking to make sure you remember to sort the list before loading it! Of course if you're also going to monitor the directory for new or updated files and then (re)load only those which are new or changed the winning set of rules for a specific language could be different than when reading them all in from cold. That suggests you might want to generate an annoying pop-up if you find the same language defined in more than one languages/*.yaml file.

Is this getting complicated enough yet? 😁

eteran commented 3 years ago

Fortunately, I don't plan to do any "monitoring" of the files. They will be read on load, and the in-memory representation can be updated (and saved back to the files) through the UI.

If the user happens to edit one of these files manually, it will either:

  1. have no effect until they restart the application
  2. possibly get overridden if they later use the UI to edit the settings before restarting nedit.
tksoh commented 3 years ago

I too thought about adding the filename suffix to the language mode to id the source yaml, and let users decide how to manage them -- we have to trust them to be intelligent enough to clean up their own house ;-) I think this might actually free nedit from the burden of considering all the potential corner cases on how these files or languages might get mingled beyond human comprehension, and we don't need to be concerned on what order they are read in.

On the sorting of lang modes, we should maintain the lexi order like Evan suggested. With the list of languages out there these days, we might need to group them by the first character of the language names to shorten the menu list.

tksoh commented 3 years ago

BTW, the "parsing error in a file that is redefining an already-defined language" is quite unlikely in practice, unless the files are manually edited somehow (why?). Even so, the broken files should just to ignored as a whole, and users should take responsibility to clean them up before loading again.

eteran commented 3 years ago

@tksoh I think that's a pretty sensible way to handle it. We can do something as subtle as just printing a warning to the console that a malformed file is being ignored.

tksoh commented 3 years ago

@eteran users would certainly appreciate some warning.

On a related topic, for Windows users, they are likely not using any console (the drag-and-drop feature probably helps make it even less likely too). Since nedit-ng now works natively on Windows, this issue probably need to be addressed sooner rather than later. I actually have an old patch on SF (again ;-) that direct t_print() to a message window. Perhaps the same concept can be ported into nedit-ng.

eteran commented 3 years ago

Yup, I was also thinking that NG may need a "console" for any output that typically ends up in the terminal.

Wouldn't be hard to implement at all since already it's all done through It's logging API meaning that I have a nice centralized place to capture it.

anjohnson commented 3 years ago

I too thought about adding the filename suffix to the language mode to id the source yaml

I'm not sure how well that would work, I have some prefix filenames (which Nedit can't currently identify automatically, I'd need a regex or a glob pattern match for that) – they are actually Makefiles, but don't have any kind of a suffix.

The parsing error while loading might not happen often, but when it does a user must have hand-edited the file and would probably really appreciate some detailed error information about exactly where the error is detected and what's wrong. This is something I would probably do when creating a new language mode, by copying and hacking at an existing one (much easier than starting from scratch), or updating one to add new keywords, or combining two existing modes (think PHP and inline HTML, Doxygen annotations in C/C++, or Perl and inline POD documentation).

Having a console for displaying errors could be useful even for Unix-like systems.

tksoh commented 3 years ago

Actually, by filename suffix, I was talking about adding the yaml filename of which the language mode was defined, so it will show up as "C++ [mycplusplus.yaml]" on the the lang modes menu, as well as the lang mode and patterns dialogs.

eteran commented 3 years ago

So there are some messy bits that I'm unsure how to properly address, and they basically all revolve around users creating/editing/deleting languages via the UI.

Things like, should we be deleting files from their config directory when they delete languages and what happens if they're in directory files seemingly conflict with files the UI would like to create.

I just want to make sure we get all the little details right. Are there any other examples of editors that:

  1. have languages (or macros, whatever)
  2. let you create/edit/delete them in the UI
  3. store them as individual files that can be also managed on disc?

I'd love to see what solutions other editors have come up with and how they deal with the more weird cases.

eteran commented 3 years ago

A thought. What if in the language/macro/whatever dialogs, we also had Import/Export buttons? This would allow them to select a language, export it to a file of their choice, and share it. And users could trivially import them?

I think to a large degree that kinda simplifies the equation a lot. Sure, it's not quite as trivial as "drop file here", but it's pretty simple while avoiding any potential quirks that come with dealing with multiple files that could conflict with each other.

Thoughts?

tksoh commented 3 years ago

I just want to make sure we get all the little details right. Are there any other examples of editors that:

  1. have languages (or macros, whatever)

Notepad++

  1. let you create/edit/delete them in the UI
  2. store them as individual files that can be also managed on disc?

I am not sure how notepad++ handles them.

A thought. What if in the language/macro/whatever dialogs, we also had Import/Export buttons? This would allow them to select a language, export it to a file of their choice, and share it. And users could trivially import them?

I think to a large degree that kinda simplifies the equation a lot. Sure, it's not quite as trivial as "drop file here", but it's pretty simple while avoiding any potential quirks that come with dealing with multiple files that could conflict with each other.

The goal here is to make it really easy & simple to share these stuffs. Anything that can accomplish the goal is fine by me :-) If we manage to cook up a nice & intuitive UI to handle them, it might actually be better than having to worry about those potential corner cases. NEdit is, after all, a GUI-based app.

That said, the way nedit5 break up the language mode and highlighting patterns in different dialogs can confuse me even after 20 years of using it. Maybe we should take care of that too.

tksoh commented 3 years ago

One other thing we should probably discuss now is the list of new languages we plan to add as default, perhaps by calling for contribution from users.

I counted nedit, hence NG, only support about 30 lang modes. I think we should start working on adding the new ones that are considered famous these days. I will start the list here, please add your suggestions:

tksoh commented 3 years ago
  1. let you create/edit/delete them in the UI
  2. store them as individual files that can be also managed on disc?

I am not sure how notepad++ handles them.

I just took a quick look at notepad++, and it does have the support to create new language patterns, and store them in a folder. I have not gone further to try it out yet though.

2020-12-28 09_31_43-_new 1 - Notepad++

eteran commented 3 years ago

I think we should start working on adding the new ones that are considered famous these days. I will start the list here, please add your suggestions:

Absolutely agreed.

tksoh commented 3 years ago

I googled the 'top 20 programming languages in 2020' and made this list that includes the lang modes already supported by nedit marked by '(*)':

tksoh commented 3 years ago

@eteran one other thing that I noticed for quite a long time is that nedit-5 fails to detect the makefiles in it's own source code, probably due partly to the 200 characters scan limit. I think this detection algo needs to be improved, or at least with the limit revised significantly up (memory should not be a problem for NG's users)

eteran commented 3 years ago

@tksoh awesome work! This list will be very helpful in building this out.

Regarding the detection of Makefiles, the reason is twofold.

  1. There is no really good regex for detecting a Makefile based on its contents. So for this particular case, we don't even have one to attemp! We just go by name.

  2. nedit-5's Makefiles are of the form Makefile.<platform>, and the fallback from using regexes is... extensions. So there isn't a good way to do that either since aside from .gmk fies, Makefile's traditionally don't have a consistent file extension.

The solution is to move from extensions to globbing (which is was .editorconfig does!). That way, you could write rules that match for example: Makefile.* and Makefile and that would cover basically 99% of Makefile detection use cases.

The challenge is of course backward compatibility, but this one isn't terrible.

switch from extensions to globbing is trivial enough that we can code in an automagical upgrade path.

If they have any languages defined that use extensions, then we just turn any of the form .whatever into *.whatever and we're good to go, users won't even know that they've been upgraded. (I believe nedit-5 did this with a regex tweak at some point too).

I think that's a good enough "quality of life" that it's worth doing :-).

tksoh commented 3 years ago

Just FYI, I am not sure how they do it, but Linux's 'file' util is able to detect the makefiles in nedit 5.

eteran commented 3 years ago

right, linux's file uses a library called "libmagic", (we probably could too...), which if it does indeed identify makefiles that may mean that I overstated when I said, "There is no really good regex for detecting a Makefile based on its contents" :-) Unless of course libmagic just returns that it's a makefile if it seems "makefile" in the name ;-).

I'll have to investigate!

eteran commented 3 years ago

Looks like ti searches for patterns using the rules found here:

https://github.com/file/file/blob/master/magic/Magdir/make

but I can also say, it works VERY well, but isn't perfect. It misses Makefile.generic and just considers it "ASCII text" on my system :-) Looks like the CFLAGS that it WOULD have found is just too far down in the file, even for libmagic.

I do agree that we could probably up the regex match to like 1024 chars or something like that though to make our regex match more likely to find what it's looking for.

tksoh commented 3 years ago

~/projects/nedit-git/makefiles$ file Makefile.linux Makefile.linux: makefile script, ASCII text ~/projects/nedit-git/makefiles$ cp Makefile.linux a ~/projects/nedit-git/makefiles$ file a a: makefile script, ASCII text

tksoh commented 3 years ago

Given the amount of RAM in modern linux systems, I think we can spare a lot more generous than 2048, especially it's just a temp buffer. Probably at least 10k so we don't have to keep coming back to it again.

eteran commented 3 years ago

Yea, I was originally gonna just say 4096 bytes because it's a nice even page of memory... but then I started doubting that because I started thinking "what if there's a regex with REALLY bad performance and it nukes load performance"... You may be right though, RAM is pretty abundant.

Side note, honestly, if we changed our languages to target a mime type, it would be TRIVIAL to default to just using libmagic. All you'd have to say is that these rules are for any file of type text/x-makefile.

If I can think of a smooth upgrade path for that, (and have good support for windows and macOS), then I may wanna just say "let libmagic do the work"

tksoh commented 3 years ago

If libmagic is available where Qt is, then it's obviously the better solution. But I wonder if a larger scan limit would take care of the problem too. Then we can spend our time working on other things first ;-)

eteran commented 3 years ago

Sadly, I don't think libmagic is included in any part of Qt to my knowledge, so we'd have to "bring it along" for some platforms. Not ideal.

But I agree, we can start with the simple things like raising the scan limit from 200 characters fore sure. That's about as trivial change as it gets.

anjohnson commented 3 years ago

Language modes are useful for other text file formats too:

eteran commented 3 years ago

Of course, JSON was in @tksoh 's list, and I think that we should do our best to support any of the popular markups as well. We do this already by supporting things like XML, HTML, CSS, etc..

tksoh commented 3 years ago

@anjohnson I added your suggestions onto the list in my original comment, so it's easier to keep track.

tksoh commented 3 years ago

@eteran Since you have started putting together the list for 6.0 milestone, I think we should give this lang mode some thought. Obviously this new import/export mechanism will be a post 6.0 feature, but we should perhaps consider consolidating patterns.yaml into languages.yaml? That way we only need to worry about one file rather than two when exporting/importing, which make things a lot more straightforward. The format of the two yaml looks fairly "compatible".

Also, I feel the lang mode lists should always to sorted alphabetically, since we would naturally looking for the lang mode in that order, shall nedit not able to detect it automatically somehow.

eteran commented 3 years ago

So there are a few things here.

Regarding the merging of YAML files. While NG doesn't have a TON of users right now, it does have some. So I'm hesitant to implement things that would either break current configs or require yet another automatic config file migration path for the code to take. (It already looks for stuff in the .ini file and will convert that to YAMLs since the nedit-import tool does a fairly 1:1 conversion from .neditrc to .config/nedit-ng/config.ini.

I'm not against it, but I'll have to think about it.

Regarding the ordering, it's not necessarily obvious, but the ability to re-order the languages actually has a purpose!

Multiple languages can have conflicting rules for file extensions, for example, both C and C++ might want .h files...

In both NEdit5 and NG, the ordering is also a prioritization list, that is the first language wins. So while you may have a language definition for C that covers .c, and .h files. And a C++ definition that covers .cpp, .hpp, .h. If the user puts C++ first in the list, then C++ will get used for .h files, but .c will still go to C. So changing that would be a fairly large change. Definitely post 6.0, and definitely not as straight forward as it might seem.

eteran commented 3 years ago

But yes, I definitely would like to see extended language support. Even making the C and C++ languages have updated patterns for the newer versions of the language would be awesome. But, I probably consider that to be something I'll need community help with :-)

tksoh commented 3 years ago

Regarding the merging of YAML files. While NG doesn't have a TON of users right now, it does have some

Actually, this is a good reason to do it now rather than later.

If we intend to pursue this feature, and it might as well be now. Else, more users are going to have to go through the same trouble. Unless there's a good reason to keep them separated.

Just my thought.

eteran commented 3 years ago

Yea, I'm kinda on the fence about how to proceed. Do I just make the change and deal with the fallout of bug reports? Do I make it so NG automatically converts? Honestly, not sure. I'm not looking to annoy the existing users ;-)

anjohnson commented 3 years ago

So nedit-ng and the nedit-import tool currently generate separate patterns.yaml and languages.yaml files. There's also the separate indent.yaml file which holds the smart indent macros in it for each language, but I'm not sure what should happen with that if you combine the other two. It should probably be discussed at the same time though.

I can see 3 sources of indentation settings:

  1. If an .editorconfig file exists when a file is opened it should control the appropriate settings.
  2. A language mode might have settings specific for that language (e.g. Python, Makefiles).
  3. Personal preferences, with the question whether personal preferences could/should be kept per-language?

Some languages (e.g. Go) have one standardized layout, and in that case maybe even the smart indent macros should belong in with the language patterns. Others (e.g. Makefiles, Python) have some rules but leave room for personal preferences as well, while for most of the rest it's all personal or per-project preference.

eteran commented 3 years ago

@anjohnson good points all around. Generally, you've got the hierarchy right:

.editorconfig will have the highest priority since it is specific to the project currently open. language modes will be next in the list, because they are specific to the kind of file you have open, regardless of the location Finally, we fall back on the default editor settings that the user has chosen as a matter of general preference.

We hadn't even discussed smart indent in this thread, and we should. Because it is indeed, also language-specific.

I think if we do this, we should go all the way and get it right once. I wish I had these thoughts the first time I implemented the YAML stuff. I think I may, for at least one release, support both versions of the YAML stuff just to avoid breaking things for users.

The part I really dislike though, is if we convert, do we delete user config files? I'm not a huge fan of doing so, but if we are willing to zero them out because we've migrated the location... same difference, right?

tksoh commented 3 years ago

I think if we do this, we should go all the way and get it right once. I wish I had these thoughts the first time I implemented the YAML stuff. I think I may, for at least one release, support both versions of the YAML stuff just to avoid breaking things for users.

There's no such thing as "the best solution". Technology evolves. Some years from now, the seemingly perfect implementation now will be rendered outdated too. Just like the old nedit.rc used to be good in the old days.

The part I really dislike though, is if we convert, do we delete user config files? I'm not a huge fan of doing so, but if we are willing to zero them out because we've migrated the location... same difference, right?

No offence, but if we get too hung up about changing old config files, then nothing will move forward :-P

Migration is the word. New stuffs take over old stuffs. It's a natural thing. NG is not even in beta stage (is it?), people should know things can change rather drastically. As long as they don't lose their work, I don't think they will get upset. If this is really a concern, NG can always backup the old one somewhere, before replacing them, so they can recall if necessary. Or even go back to use the old NG ;-)

Just my thought.