github-linguist / linguist

Language Savant. If your repository's language is being reported incorrectly, send us a pull request!
MIT License
12.04k stars 4.18k forks source link

My repository misclassifed as Roff #3613

Closed kmruehl closed 7 years ago

kmruehl commented 7 years ago

My repository is language is tagged at 'Roff' but it's a set of matlab scripts and simulink files: https://github.com/WEC-Sim/WEC-Sim

same thing with out application repo: https://github.com/WEC-Sim/WEC-Sim_Applications

Alhadis commented 7 years ago

👋 Hey @kmruehl,

The misclassification is occurring because of these files, which use a file-extension associated with Roff documents (*.3). You can mark .3 files as text with a .gitattributes file:

*.3 linguist-language=text

Bear in mind you might need to make a manual change to a .3 file to force a reindexing of the repo's language stats.

kmruehl commented 7 years ago

Thank you, I'll edit this.

Alhadis commented 7 years ago

Hi @kmruehl,

I noticed linguist-language=text didn't work to fix the classification. This might be because Linguist recognises text as a data-type language, not a markup or programming-type language. Only the latter two types are considered when generating usage statistics (so you'll never see a repository classified as text or XML, because those are data languages).

However, I should have recommended you use the linguist-documentation attribute instead (my fault, sorry!)

*.3 linguist-documentation

Be sure not to include this in your .gitattributes file, though:

$ cat .gitattributes
kmruehl commented 7 years ago

@Alhadis thanks, i edited the .gitattribute file accordingly and the repository is still being mis-classified

lildude commented 7 years ago

I've just checked your repo, and it's still showing Roff because of the .1 files under tutorials/BEMIO/WAMIT/:

[...]
Roff:
tutorials/BEMIO/WAMIT/Coer_Comp/coer_comp.1
tutorials/BEMIO/WAMIT/Cubes/cubes.1
tutorials/BEMIO/WAMIT/Cylinder/cyl.1
tutorials/BEMIO/WAMIT/Ellipsoid/ellipsoid.1
tutorials/BEMIO/WAMIT/OSWEC/oswec.1
tutorials/BEMIO/WAMIT/RM3/rm3.1
tutorials/BEMIO/WAMIT/Sphere/sphere.1
tutorials/BEMIO/WAMIT/WEC3/wec3.1

You'll need to add a *.1 linguist-documentation line to your gitattributes file too.

Testing locally, doing so results in:

96.67%  Matlab
2.86%   Python
0.43%   C
0.04%   HTML
kmruehl commented 7 years ago

Thank you! This resolved the issue. Much appreciated.

FermiParadox commented 2 years ago

In my case

I unintentionally created files whose name contained a dot. Therefore, it was seen as an extension. Also, I assume file lines contribute as well to the misclassification.

E.g. instead of my_file_name_0_5ms, i was creating my_file_name_0.5ms. Note the .5ms.

Solution

Use a tool that searches all file extensions (and counts file lines) to detect the problematic files.

Edm1795 commented 2 years ago

Wouldn't it be easier, and thus way better, for language type to be always manually specified by the programmer themselves? I mean who writes a program and doesn't know which language they are using? The real problem is not the .1 or .3 files, the real problem is the auto language analyzer so why not just let programmers set it manually?

Nixinova commented 2 years ago

so why not just let programmers set it manually?

You can do that if you want using linguist-language=whatever in .gitattributes files. Having defaults is a no brainer though because not everyone is going to be bothered to specify that and having syntax highlighting is important.