Closed kmruehl closed 7 years ago
👋 Hey @kmruehl,
The misclassification is occurring because of these files, which use a file-extension associated with Roff documents (*.3
). You can mark .3
files as text with a .gitattributes
file:
*.3 linguist-language=text
Bear in mind you might need to make a manual change to a .3
file to force a reindexing of the repo's language stats.
Thank you, I'll edit this.
Hi @kmruehl,
I noticed linguist-language=text
didn't work to fix the classification. This might be because Linguist recognises text as a data-type language, not a markup or programming-type language. Only the latter two types are considered when generating usage statistics (so you'll never see a repository classified as text or XML, because those are data languages).
However, I should have recommended you use the linguist-documentation
attribute instead (my fault, sorry!)
*.3 linguist-documentation
Be sure not to include this in your .gitattributes
file, though:
$ cat .gitattributes
@Alhadis thanks, i edited the .gitattribute file accordingly and the repository is still being mis-classified
I've just checked your repo, and it's still showing Roff because of the .1
files under tutorials/BEMIO/WAMIT/
:
[...]
Roff:
tutorials/BEMIO/WAMIT/Coer_Comp/coer_comp.1
tutorials/BEMIO/WAMIT/Cubes/cubes.1
tutorials/BEMIO/WAMIT/Cylinder/cyl.1
tutorials/BEMIO/WAMIT/Ellipsoid/ellipsoid.1
tutorials/BEMIO/WAMIT/OSWEC/oswec.1
tutorials/BEMIO/WAMIT/RM3/rm3.1
tutorials/BEMIO/WAMIT/Sphere/sphere.1
tutorials/BEMIO/WAMIT/WEC3/wec3.1
You'll need to add a *.1 linguist-documentation
line to your gitattributes
file too.
Testing locally, doing so results in:
96.67% Matlab
2.86% Python
0.43% C
0.04% HTML
Thank you! This resolved the issue. Much appreciated.
I unintentionally created files whose name contained a dot. Therefore, it was seen as an extension. Also, I assume file lines contribute as well to the misclassification.
E.g. instead of my_file_name_0_5ms
, i was creating my_file_name_0.5ms
. Note the .5ms
.
Use a tool that searches all file extensions (and counts file lines) to detect the problematic files.
Wouldn't it be easier, and thus way better, for language type to be always manually specified by the programmer themselves? I mean who writes a program and doesn't know which language they are using? The real problem is not the .1 or .3 files, the real problem is the auto language analyzer so why not just let programmers set it manually?
so why not just let programmers set it manually?
You can do that if you want using linguist-language=whatever
in .gitattributes
files.
Having defaults is a no brainer though because not everyone is going to be bothered to specify that and having syntax highlighting is important.
My repository is language is tagged at 'Roff' but it's a set of matlab scripts and simulink files: https://github.com/WEC-Sim/WEC-Sim
same thing with out application repo: https://github.com/WEC-Sim/WEC-Sim_Applications