Closed Faq closed 9 years ago
If all Delphi code must be marked as Pascal, please change all references to Linux on GitHub to UNIX.
Is there a simple way to differentiate Pascal and Delphi based on the code source that could be used to make a heuristic?
A "Delphi project" requires a .dpr or .dproj (newer versions of Delphi), which is a "make" file for a Delphi project. I believe either command line or GUI programs should have one of these in the same folder as other .pas source code files.
All Delphi GUI programs would have one or more .dfm (Delphi form) files, but a Delphi command line utility would not.
So perhaps a directory having .pas files, but no .dpr/.dproj or no .dfm files could be labeled "Pascal" but otherwise could be labeled as a "Delphi" project if these files were present.
Here is some additional info about Delphi-related file types and source control: http://stackoverflow.com/questions/438414/delphi-file-types
When the heuristics fail, why not provide a way for the owner to specify the type from a controlled list?
I don't think linguist is capable, for the moment, of working on a folder rather than on a file. Thus, a heuristic would have to be something in the code.
When the heuristics fail, why not provide a way for the owner to specify the type from a controlled list?
I don't know but it's a recurring question, maybe @arfon could explain...
If the label "Pascal" is attached to the whole repository, I'd argue it should reflect the whole repository, not just a few files.
I'd rather see "unknown" on my Delphi repository instead of "Pascal". There are many flavors of Pascal: somewhere I have USCD Pascal and TurboPascal examples. If allowed, I'd label them "USCD Pascal" and "TurboPascal", not just "Pascal" if I ever put them on GitHub.
Why not let the person that knows put the correct label on the repository instead of using a forced heuristic? A machine learning algorithm could use these known classifications to look for misclassified ones.
I was surprised I didn't get to classify the Delphi repository when I uploaded it. I'd argue the "Pascal" label is somewhere between "misleading" and "wrong".
@pchaigno @EarlGlynn I think some heuristics would work best here.
Are there any good ways to disambiguate between the two languages by syntax or file extensions?
I gave a description of Delphi-related file extensions above. The link discussed Delphi file extensions for source control, which could be used to classify a set of files in a repository as a Delphi project.
We may need to agree to disagree on approach here. Perhaps heuristics can often classify individual files, but I don't understand why a heuristic classifying a repository is better than the person developing the code.
Since Delphi was introduced, ~1995, I don't remember ever searching for "Pascal" to find something that is "Delphi". Delphi grew out of TurboPascal, not simply "Pascal."
Why is the author not qualified to make the classification? Why isn't the heuristic applied only when the author fails to specify one?
I don't understand why the category assigned to the repository for my GitHub page is "JavaScript". I won't be publishing anything about JavaScript there, yet I don't see how to change the assignment.
Module imports such as
uses SysUtils;
are one way to tell apart Turbo Pascal, Free Pascal and Delphi and other Pascal dialects. However, I don't think you can reliably differentiate between Free Pascal ("FP"), Turbo Pascal ("TP") and Delphi themselves when operating on a single .pas
file; a lot of FP code is valid TP/Delphi code and vice versa.
One difference between FP and Delphi that you could leverage is the encoding for Unicode strings: Delphi uses UTF-16 while Free Pascal uses UTF-8. You could detect the encoding using character frequency analysis, for which there are existing libraries. Of course, files without any Unicode strings would remain ambiguous.
Personally, I wouldn't mind if TP/FP/Delphi code was identified as a single category distinct from plain Pascal, although I can't think of a good name for it. Edit: Actually, I can: "Object Pascal".
We have a couple of different ways to override language detection now in Linguist which is probably as good as we can do right now. Please take a look at these over here: https://github.com/github/linguist#overrides
I really do not understand this thread. Why are there summaries by language of GitHub repositories when you refuse to put the right label on the repository? You even refuse to let the authors put the right labels on the repository. This "resolution" simply does not make any sense to me.
Still nothing on this? arfon's suggestion was tried, but does not work. If I add
*.pas linguist-language=Delphi
to the .gitattributes file, the language changes from Pascal to Component Pascal, which is another completely different language, which is neither Pascal nor Object Pascal/Delphi.
It's funny how people use github statistics to argument that Delphi does not show in the Top languages used today, when github doesn't even recognize the existence of the language.
Would it be so hard to make it work correctly? I don't mind having to add a .gitattributes file, if it would only work!
@nunopicado - this is because delphi
is listed as an alias for Component Pascal.
Probably the best thing we can do right now is to list Dephi as a language in languages.yml
but not add any extensions as we still don't have a way to reliably identify a Delphi project on a per-file basis.
Doing this would allow the overrides to work (*.pas linguist-language=Delphi
) but little else :-\
Thanks @arfon, for your reply.
I do think Delphi should be separated from Component Pascal, the same for Object Pascal. Delphi and Object Pascal could be marked as alias to one another (one as a language, one as an alias, I don't really mind which is which). Even though there are different flavours of Object Pascal, at least it fits in the description.
There are some file extensions which are exclusive to Delphi. Those could be added. The problem is with .pas files, which is a common extension.
It was mentioned that there must be something in the code to differentiate standard Pascal from Delphi/Object Pascal. Well, I guess there is.
Standard Pascal is not object oriented, so there are no classes. Delphi, on the other hand, is heavily object oriented, so there will probably be not many .pas files which do not have the keyword Class.
Would that be enough to create an rule for linguist?
Standard Pascal is not object oriented, so there are no classes. Delphi, on the other hand, is heavily object oriented, so there will probably be not many .pas files which do not have the keyword Class.
Very possibly! Basically we need to be able to write a heuristic (a regular expression) that is applied on a per-file extension basis. Here's a good example for .es
which is used by JavaScript and Erlang: https://github.com/github/linguist/blob/master/lib/linguist/heuristics.rb#L139-L145
@nunopicado - Would you be interested in attempting this in a Pull Request?
Strange as it may seem, I never used regular expressions! :) But I'll give it a try. I'll check you example and try to create something that can differentiate .pas files. I'll get back to you on this. Thank you @arfon! ;)
Strange as it may seem, I never used regular expressions! :)
Welcome to the dark side 😉
I'll check you example and try to create something that can differentiate .pas files. I'll get back to you on this.
👍 thanks. We can definitely help you get this polished up.
:+1:
still un patched. seen as C++ and java when its clearly FPC sources for SDL.
@JazzMaster Could you open a separate issue with the appropriate details so that we can look into a fix?
Seems all delphi projects show as pascal now Example: https://github.com/Faq/TXTGenerator https://github.com/Gurux/Gurux.DLMS.Delphi