github-linguist / linguist

Language Savant. If your repository's language is being reported incorrectly, send us a pull request!
MIT License
12.25k stars 4.24k forks source link

Modula-2 file extension '.def' is missing #6451

Open trijezdci opened 1 year ago

trijezdci commented 1 year ago

https://github.com/github-linguist/linguist/blob/d352058f979caab69b5c0876cd4992ce43a29132/lib/linguist/languages.yml#L4222

Modula-2 separates interface and implementation in separate files.

Interfaces (historically but incorrectly called definition modules) have file extension '.def'. Implementation and program modules have file extension '.mod'.

This is similar to C's '.h' and '.c' files but unlike C, the modules are NOT preprocessor includes, they are separate compilation units and it is entirely impossible to write Modula-2 libraries without having '.def' files. This is not a matter of choice or style, this is an absolute requirement.

Not supporting '.def' as a file extension for Modula-2 means that the entire syntax highlighting for Modula-2 is completely broken as it is 50% incomplete.

So, can you please add recognition for the '.def' file extension.

I had reported this many years ago, but apparently nothing has happened since and I can't even find the report now. It would be nice if this wasn't just swept under the carpet again. It should be rather easy to add.

Thank you in advance.

lildude commented 1 year ago

So, can you please add recognition for the '.def' file extension.

I had reported this many years ago, but apparently nothing has happened since and I can't even find the report now.

I can with a very easy search: https://github.com/github-linguist/linguist/issues/3657 😉 It was closed automatically back when we used that bot. That issue also went way into the weeds all outside of Linguist's scope.

And in much as my response then, the same applies now: Linguist is a community-driven project, so in short: if you want it, submit a PR to add support or implement an override and wait patiently until someone else does.

You can find details for adding support in the CONTRIBUTING.md file.

One thing to keep in mind is .def is a very generic extension likely to be used by many languages. We can add support but it will probably require a very precise heuristic (aka regex) for identifying the files as precisely as possible to reduce the chances of misclassifying other languages.

If you're not prepared to add support yourself, please add and fill in the "Feature request" issue template to the OP of this issue as you should have used it when opening this issue.

trijezdci commented 1 year ago

I have forked the Sublime Text files that Linguist appears to be using for Modula-2, and I have made several corrections so that this reflects the classic Modula-2 Language originally published by Prof. Niklaus Wirth at ETH Zurich in his book "Programming in Modula-2", published by Springer Verlag. This book contains the language report and the language dialect it describes is known as PIM Modula-2, where PIM is a shorthand for the title of the book.

https://github.com/trijezdci/Sublime-Modula-2

However, both .def and .mod are already in those files, so this will not fix the issue of not recognising the file extension.

I am assuming there is at least one other place (within the Linguist repo) where .def needs to be added.

I can do that if you can confirm where this needs to be added.

As for the argument that .def might be used by other languages, I don't think that is justification not to support .def for Modula-2 because like I mentioned, it is not a style choice, it is ESSENTIAL, if it isn't there, then Modula-2 isn't supported, it is that simple. If the language is to be supported, then .def must be supported.

Besides, I have implemented and contributed multi-dialect Modula-2 support for/to vi and VIM where the disambiguation needed to be done for .mod because there are other languages that use .mod, but there was none that used .def.

I am quite happy to add disambiguation though because this would likely make it possible to support multiple dialects as it used to be when Github (and Bitbucket) still used Pygments. I had contributed multi-dialect Modula-2 support to Pygments which used (and still uses) a comment at the beginning of a source file with a dialect tag to tell the renderer for which dialect the file should be rendered. I did the same for Modula-2 support in vi/VIM and the maintainer of GNU Modula-2 did the same for Emacs.

However, I need somebody who understands Linguist to point me to where such disambiguation code is to be added and where the feature is documented so I can read up on how to do this. I will also need to know how to add multiple grammars for the same language.

regards benjamin

trijezdci commented 1 year ago

If you can replace the current incorrect Sublime Text definitions for M2 with the correct forked one I made, then I would also like to know a little more about what features in the Sublime Text definitions are actually having an impact on Linguist. For example there are some definitions in there that are quite obviously for text completion, and I doubt that Linguist uses that information at all. There are probably other such things that are only relevant for editors, not for rendering and thus probably ignored by Linguist. If we can identify them, then I would like to remove all of those.

trijezdci commented 1 year ago

One more thing on file extensions. In classic Modula-2 and also in ISO Modula-2, the interface files were incorrectly called definition modules even though being interfaces, they contain declarations while the implementation files contain the corresponding definitions. Unfortunately, the mistake in nomenclature is already in the syntax as the interface files start with the syntax DEFINITION MODULE. It's counter-intuitive but that's also a reason why you can't just change the file extension to something else which is of course easier to do in a compiler than changing syntax.

The incorrect nomenclature along with its syntax has been corrected in the 2010 revision of Modula-2 where the interfaces are called interface modules and their syntax is INTERFACE MODULE. However, since most Modula-2 users are accustomed to the file naming, the .def file extension remains supported.

lildude commented 1 year ago

However, both .def and .mod are already in those files, so this will not fix the issue of not recognising the file extension.

I am assuming there is at least one other place (within the Linguist repo) where .def needs to be added.

Yup. The appropriate language within the languages.yml file.

As for the argument that .def might be used by other languages, I don't think that is justification not to support .def for Modula-2 because like I mentioned, it is not a style choice, it is ESSENTIAL, if it isn't there, then Modula-2 isn't supported, it is that simple. If the language is to be supported, then .def must be supported.

I agree and it might be essential, however we need to differentiate between Modula-2's use and everything else; you wouldn't want your .def files identified as Ruby, for example, just because someone else said this extension is used by Ruby. The same applies the other way... if we add .def to Modula-2 without any form of heuristic or adding it to other languages at the same time, to limit it to this language or allow the classifier to make a guess based on the samples, EVERY .def file will be identified as this language. This would be moot if the extension were unique to the language, but I fear .def probably isn't hence the requirement.

It's important to keep in mind that Linguist analyses files in isolation. It makes no consideration, nor should it, for other files in the repo or directory structure as a repo or gist can legitimately contain only a single file and people would want that to be identified as correctly as possible.

I am quite happy to add disambiguation though because this would likely make it possible to support multiple dialects as it used to be when Github (and Bitbucket) still used Pygments. I had contributed multi-dialect Modula-2 support to Pygments which used (and still uses) a comment at the beginning of a source file with a dialect tag to tell the renderer for which dialect the file should be rendered. I did the same for Modula-2 support in vi/VIM and the maintainer of GNU Modula-2 did the same for Emacs.

However, I need somebody who understands Linguist to point me to where such disambiguation code is to be added and where the feature is documented so I can read up on how to do this. I will also need to know how to add multiple grammars for the same language.

See my comments in the discussion you started.

If you can replace the current incorrect Sublime Text definitions for M2 with the correct forked one I made, then I would also like to know a little more about what features in the Sublime Text definitions are actually having an impact on Linguist.

We only use the syntax highlighting parts of a grammar, and as I mentioned in the discussion, these need to be Textmate-compatible grammars, which Sublime 2 so happens to implement. Also mentioned: how to write and maintain Textmate compatible grammars is outside of the scope of Linguist though @Alhadis is quite the expert so may be able to offer tips and help. Textmate has their own documentation (though it is a bit poor the last time I looked) as does VS Code.