github-linguist / linguist

Language Savant. If your repository's language is being reported incorrectly, send us a pull request!
MIT License
12.29k stars 4.26k forks source link

Invalid language detection in the repository Shell vs C #4192

Closed ClnViewer closed 6 years ago

ClnViewer commented 6 years ago

I have Invalid language detection in the repository Shell vs C

In my repository, the language mapping is incorrect, the statistics repository includes 137 files * .c and 13 files of the autotools system with * .sh extensions. As a result, the repository has the type Shell. The file .gitattributes did not help.

At the same time, in the statistical graph of the repository you write: Shell 43% C 39.7% Makefile 18.6% M4 0.5%

statistic search

Update: linguist add to statistic all autotools files, configure*, libtool etc :(

URL of the affected repository:

https://github.com/ClnViewer/LibWchar2

Last modified on:

2018-07-11

Expected language:

C

Detected language:

Shell

pchaigno commented 6 years ago

Please read How Linguist works.

lildude commented 6 years ago

@pchaigno responded whilst I was writing this, but specifically to his response, which you've acknowledged you've read, but probably missed 😉 is...

The percentages are calculated based on the bytes of code for each language as reported by the List Languages API.

Your repo has ever-so-slightly more Shell than C:

Shell 310 KB C 273 KB Makefile 135 KB M4 3.36 KB

... hence the first language in the language bar, and thus it being used to indicate that your repo is predominantly Shell.

Your override don't appear to have much of an effect because you're not targeting any of the files that make up the Shell total:

autogen.sh
build.sh
config.status
libtool
test/createtest.sh

libtool is the largest of these at ~252 KB which is pretty much most of the 310 KB total for Shell.

The only way you'll be able to adjust this weighting is to write more C 😉, mark one of these files incorrectly as C, or start to mark some of the Shell files as generated or vendored. Some are clearly generated so that may do the trick.

pchaigno commented 6 years ago

@lildude I'm starting to think we should have a thorough FAQ for users to read instead of the README (which is quite long). Most issues reported are very similar. Once we've made that easier on the user, I'd be in favor of automatically closing any issue that's missing the checkboxes...

lildude commented 6 years ago

Yeah, I was thinking like that. Not sure how best to do it ATM.

ClnViewer commented 6 years ago

Thanks, but in my opinion it is more reasonable to make all the reserved names of autotools in the list not subject to indexing, they are almost everywhere in C/C++ projects. And about the size of the files, it's completely lost in the text..

pchaigno commented 6 years ago

in my opinion it is more reasonable to make all the reserved names of autotools in the list not subject to indexing, they are almost everywhere in C/C++ projects.

You did write those tools right? Or were they generated?

ClnViewer commented 6 years ago

Of course they were generated. As probably in all other projects using autotools

pchaigno commented 6 years ago

In that case, you could probably add them to generated.rb.

pchaigno commented 6 years ago

Closing. We'd welcome a pull request to automatically recognize these files as generated.