github-linguist / linguist

Language Savant. If your repository's language is being reported incorrectly, send us a pull request!
MIT License
12.32k stars 4.26k forks source link

improve Prolog vs. IDL detection (.pro files) #5053

Open slayoo opened 4 years ago

slayoo commented 4 years ago

Apparently some .pro files are detected as Prolog, while others are given the IDL label. The latter is related with the IDL/GDL/PV-WAVE language family.

In the GDL project, all .pro files are IDL source code, while they fall in both categories according to github:

Would be great to improve consistency (i.e., so that all are detected as IDL).

Likely relevant helper info: https://github.com/blackducksoftware/ohcount/blob/master/src/parsers/idl_pvwave.rl

HTH, Sylwester (originaly reported by @EdwardEisenhauer)

lildude commented 4 years ago

The .pro extension is associated with quite a few different languages and thus relies upon the heuristic:

https://github.com/github/linguist/blob/1df78c248cafa6b414651673846e26e388710df9/lib/linguist/heuristics.yml#L378-L391

... and samples to identify the language based on the content so in order to make things more consistent, we'd need to improve the heuristics and add a few more representative samples.

Please feel free to open a PR to help improve things.

slayoo commented 4 years ago

So the current rules are as follows:

- extensions: ['.pro']
  rules:
  - language: Proguard
    pattern: '^-(include\b.*\.pro$|keep\b|keepclassmembers\b|keepattributes\b)'
  - language: Prolog
    pattern: '^[^\[#]+:-'
  - language: INI
    pattern: 'last_client='
  - language: QMake
    and:
    - pattern: HEADERS
    - pattern: SOURCES
  - language: IDL
pattern: '^\s*function[ \w,]+$'
slayoo commented 4 years ago

Some notes: