anuraghazra / github-readme-stats

:zap: Dynamically generated stats for your github readmes
https://github-readme-stats.vercel.app
MIT License
68.41k stars 22.43k forks source link

Top Languages Card not working properly #136

Closed fluorspar20 closed 4 years ago

fluorspar20 commented 4 years ago

Describe the bug I have quite a few repos with the top language being javascript. However, the top languages card doesn't show javascript at all in the list.

Expected behavior It should show js as one of the top languages.

Screenshots / Live demo link Screenshot_2020-07-21 fluorspar20 - Overview(1)

rjoydip-zz commented 4 years ago

Same with my profile.

github-readme-stats/top-langs not showing all languages as well as percentages.

But github-profile-languages is more appropriate.

filiptronicek commented 4 years ago

Sure, but I think the GitHub GraphQl API just takes the raw LOC count from all your repos and does the analysis from that. Not 100% sure tho

filiptronicek commented 4 years ago

Ok, I was mistaken, it should just fetch the top language of the repo. @anuraghazra WDYT about this? https://github.com/anuraghazra/github-readme-stats/blob/dc3e9a59f2ce202d9fa51ac489640b4e520d750f/src/fetchTopLanguages.js#L5-L25

anuraghazra commented 4 years ago

Same with my profile.

github-readme-stats/top-langs not showing all languages as well as percentages.

But github-profile-languages is more appropriate.

Working on my side, which browser you are using?

not working

live demo

rjoydip-zz commented 4 years ago

Working on my side, which browser you are using?

Chrome. But below code is showing rust & go lang as well.

query {
  user(login: "rjoydip") {
    repositories(isFork: false, first: 100) {
      nodes {
        languages(first: 1) {
          edges {
            size
            node {
              color
              name
            }
          }
        }
      }
    }
  }
}
anuraghazra commented 4 years ago

Ok, I was mistaken, it should just fetch the top language of the repo. @anuraghazra WDYT about this?

Yup it should fetch the correct langs.

filiptronicek commented 4 years ago
Mine is also weird 😕 @anuraghazra Actual (pie chart) Our implementation
image my langs
anuraghazra commented 4 years ago

NOTE: Consider the 100 max repos & also it get's the totalSize (in bytes) to calculate how many bytes you have written with the language.

filiptronicek commented 4 years ago

Is that how we want it to be? Is there not a better implementation?

NOTE: Consider the 100 max repos & also it get's the totalSize (in bytes) to calculate how many bytes you have written with the language.

anuraghazra commented 4 years ago

Is that how we want it to be? Is there not a better implementation?

NOTE: Consider the 100 max repos & also it get's the totalSize (in bytes) to calculate how many bytes you have written with the language.

That's how github calculates and it's all fetched from github's api so no way the data is wrong, maybe the data processing is wrong from my side. have to do some experiments.

anuraghazra commented 4 years ago

I'll look into this tomorrow.

anuraghazra commented 4 years ago

Working on my side, which browser you are using?

Chrome. But below code is showing rust & go lang as well.

query {
  user(login: "rjoydip") {
    repositories(isFork: false, first: 100) {
      nodes {
        languages(first: 1) {
          edges {
            size
            node {
              color
              name
            }
          }
        }
      }
    }
  }
}

Hi @rjoydip yes, but as you can see


              "edges": [
                {
                  "size": 196,
                  "node": {
                    "color": "#dea584",
                    "name": "Rust"
                  }
                }
              ]

There is only one rust lang in those 100 results, and the size is 196bytes it's i think this is why it's not showing

anuraghazra commented 4 years ago

Maybe if i change the gql query to fetch 5 langs from a certain repo then it would be better because for now i'm just only selecting one language from each repo.

user(login: "rjoydip") {
    repositories(isFork: false, first: 100) {
      nodes {
        languages(first: 5) {
          edges {
            size
            node {
              color
              name
            }
          }
        }
      }
    }
  }
rjoydip-zz commented 4 years ago

Maybe if i change the gql query to fetch 5 langs from a certain repo then it would be better because for now i'm just only selecting one language from each repo.

user(login: "rjoydip") {
    repositories(isFork: false, first: 100) {
      nodes {
        languages(first: 5) {
          edges {
            size
            node {
              color
              name
            }
          }
        }
      }
    }
  }

@anuraghazra Yes, I saw the same thing. It'll be better to make dynamic isFork and languages as variable.

anuraghazra commented 4 years ago

Not dynamic, making it max 5 or 10 would do the job, a repo can't have too much languages anyways.

and isFork should always be false, don't want to count forked repos. for example if anyone forked reactjs then they would have lot of js code

filiptronicek commented 4 years ago

I am of the opinion, that we should count forks too, it's something, that GitHub also does, and forks exist also because people have projects they make on their own.

I am not sure if GitHub provides this, I am not exactly an expert on their v4 API, but the extensions that provide the same solution must be querying it somehow, I'll look into that.

filiptronicek commented 4 years ago

Just a link for some info on another solution: https://github.com/freyamade/github-user-languages

@anuraghazra

rjoydip-zz commented 4 years ago

FYI...

{
  user(login: "rjoydip") {
    repositories(isFork: false, first: 100, orderBy: {field: UPDATED_AT, direction: DESC}) {
      nodes {
        name
        updatedAt
        languages(first: 5, orderBy: {field: SIZE, direction: DESC}) {
          nodes {
            name
          }
        }
        primaryLanguage {
          name
        }
      }
    }
  }
}
filiptronicek commented 4 years ago

Useful, thanks, for this, I think it would be enough to not use the languages, just the primary one. / cc: @anuraghazra @rjoydip

{
  user(login: "rjoydip") {
    repositories(isFork: false, first: 100, orderBy: {field: UPDATED_AT, direction: DESC}) {
      nodes {
        primaryLanguage {
          name
        }
      }
    }
  }
}

That gives out something like this:

{
  "data": {
    "user": {
      "repositories": {
        "nodes": [
          {
            "primaryLanguage": {
              "name": "TypeScript"
            }
          },
          {
            "primaryLanguage": {
              "name": "TypeScript"
            }
          },
          {
            "primaryLanguage": {
              "name": "Java"
            }
          },
          {
            "primaryLanguage": null
          }
    ]
}
stemount commented 4 years ago

Useful, thanks, for this, I think it would be enough to not use the languages, just the primary one. / cc: @anuraghazra @rjoydip

{
  user(login: "rjoydip") {
    repositories(isFork: false, first: 100, orderBy: {field: UPDATED_AT, direction: DESC}) {
      nodes {
        primaryLanguage {
          name
        }
      }
    }
  }
}

That gives out something like this:

{
  "data": {
    "user": {
      "repositories": {
        "nodes": [
          {
            "primaryLanguage": {
              "name": "TypeScript"
            }
          },
          {
            "primaryLanguage": {
              "name": "TypeScript"
            }
          },
          {
            "primaryLanguage": {
              "name": "Java"
            }
          },
          {
            "primaryLanguage": null
          }
    ]
}

I think this is would be good for "most used language widget" as it is currently an approximation of many repos.

for example I could one repo that is just an express app just serving mostly HTML, but it would say 100% typescript.

filiptronicek commented 4 years ago

Useful, thanks, for this, I think it would be enough to not use the languages, just the primary one. / cc: @anuraghazra @rjoydip

I think this is would be good for "most used language widget" as it is currently an approximation of many repos.

for example I could one repo that is just an express app just serving mostly HTML, but it would say 100% typescript.

Does this mean, that HTML isn't considered a language in this analysis? I am a bit confused. Can you give me an example repo?

anuraghazra commented 4 years ago

I don't think we can effectively do language analysis, for example take a scenario if someone uploaded node_modules to their github then their javascript would be 100% no matter what. same as #153

filiptronicek commented 4 years ago

I don't think we can effectively do language analysis, for example take a scenario if someone uploaded node_modules to their github then their javascript would be 100% no matter what. same as #153

True, but nobody should ever do that (upload their node_modules), if they do, they cannot be then angry at our code, which considers Industry best practices and it also affects GitHub's own analysis.

anuraghazra commented 4 years ago

I don't think we can effectively do language analysis, for example take a scenario if someone uploaded node_modules to their github then their javascript would be 100% no matter what. same as #153

True, but nobody should ever do that (upload their node_modules), if they do, they cannot be then angry at our code, which considers Industry best practices and it also affects GitHub's own analysis.

Yup, but that's not my point, there are lot of scenarios where we cannot evaluate code correctly and there is no perfect way to do that, lets just take an example of my website's github repo which has a develop branch and a master branch and master branch holds all the static html code which is generated by gatsby, there are huge amounts of meta data, JSON data & javascript files.

https://github.com/anuraghazra/anuraghazra.github.io/tree/master

tobiasvl commented 4 years ago

I also came here because I saw discrepancies between this tool and my Chrome extension-generated pie chart (linked above).

As an example, this tool says my top language is JavaScript with 46.21% and Lua is second with 29.29%, while the pie chart says I have 5 JavaScript repos and 15 Lua repos. However, if I do a search for JavaScript repos, I only get 1. Not sure what to make of that; presumably this tool counts LOC while the pie chart counts top language per repo, but perhaps the pie chart counts forks too, since it comes up with 5 and not 1?

By the way, not sure if it's relevant, but organization profile pages (like https://github.com/github) actually list the top 5 languages in the org's repos (without bars or percentages or anything fancy). It looks like those are just top languages, not LOC. I might be mistaken though.

SSARCandy commented 4 years ago

I have also noticed the language analysis is off, I created a pull request about it #204, which fetch top language in each repo by adding orderBy: {field: SIZE, direction: DESC} constrain.

tobiasvl commented 4 years ago

Hmm. Something seems to have changed, but it doesn't seem fixed. For me, Python was my fifth language before, and now it's been replaced by Java... Some other changes in percentages too, but Python dropping off the list shouldn't have happened since it's my second largest language (behind Lua) in the Chrome pie chart. (JavaScript is still my top language here, erroneously).

tobiasvl commented 4 years ago

Oh, and if I hide JavaScript and Java, my card just displays the other three languages. It's not fetching two other languages to replace the ones I hide. I'm not sure if this is a separate bug with hiding, or if it's related to it not fetching my languages correctly... Maybe it thinks I only have five languages total?

anuraghazra commented 4 years ago

h, and if I hide JavaScript and Java, my card just displays the other three languages. It's not fetching two other languages to replace the ones I hide. I'm not sure if this is a separate bug with hiding, or if it's related to it not fetching my languages correctly... Maybe it thinks I only have five languages tot

It fetches all the langs, calculates the top langs, picks top 5 langs, and you just hide langs from those top 5s

tobiasvl commented 4 years ago

Ah OK... My mistake then, I misunderstood the "hide" feature.

wopian commented 4 years ago

Python is shown as 52.59% for me, but my only interaction with Python is a single fork (open PR) of one of those Awesome README lists that uses a total of 2 Python scripts to run tests in an otherwise language-less repository.

JavaScript is way down at 7.63% despite 15-22 of 51 public repositories being JavaScript and where over 90% of my stars come from and half of the commits to my own repositories.

Live Static
image

image https://profile-summary-for-github.com/user/wopian

NikhilCodes commented 4 years ago

I'm getting inaccurate percentages per languages, For instance consider dart and python ss1 This image is from http://ionicabizau.github.io/github-profile-languages/ Now this image is from the github-readme-stats api, but both have different results for Python and dart.

Any solutions?

anuraghazra commented 4 years ago

I'm getting inaccurate percentages per languages, For instance consider dart and python ss1 This image is from http://ionicabizau.github.io/github-profile-languages/ Now this image is from the github-readme-stats api, but both have different results for Python and dart.

Any solutions?

This is because we are only calculating the first (which has the most code bytes) language instead of all the languages in the repo

languages(first: 1, orderBy: {field: SIZE, direction: DESC}) {
NikhilCodes commented 4 years ago

I'm getting inaccurate percentages per languages, For instance consider dart and python ss1 This image is from http://ionicabizau.github.io/github-profile-languages/ Now this image is from the github-readme-stats api, but both have different results for Python and dart. Any solutions?

This is because we are only calculating the first (which has the most code bytes) language instead of all the languages in the repo

languages(first: 1, orderBy: {field: SIZE, direction: DESC}) {

Even if that were the case, the repos that contain only python codes and those containing 90%+ dart codes should have at least a similar stat. And I'm pretty sure that dart language percentage in total should be alteast 20%.

And I have 10 dart repos and 7 python repos, it seems impossible for python to get 99% And dart to get 0.41%

Not to mention dart has a larger src compared to python in general cases

tobiasvl commented 4 years ago

This is because we are only calculating the first (which has the most code bytes) language instead of all the languages in the repo

This simply can't be what actually happens. Like I demonstrated above I only have one repo with JavaScript as the top language, and yet github-readme-stats says it's my top language, far above languages that are the top (only) language in 15 repos each or something.

anuraghazra commented 4 years ago

@tobiasvl i was experimenting with the api & the code, Are you satisfied with these stats?

tobais_lang_stat

anuraghazra commented 4 years ago

And i also like the suggestion of @stemount

I think this is would be good for "most used language widget" as it is currently an approximation of many repos.

I think the "Top Languages" labeling is misleading, it should be "Most used languages"

anuraghazra commented 4 years ago

@NikhilCodes i don't no whats wrong with your profile but i've checked with other user's stats with the fix i'm working on and they are all fine expect yours.

nikhil_lang_stat

Btw i've checked the graphql request and seems like you do have a very very very large python repo, and this repo is so huge in bytes its straight up kicking your dart stats, so i think the stats are totally fine.

{
            "nameWithOwner": "NikhilCodes/VirtualBLU",
            "isFork": false,
            "languages": {
              "edges": [
                {
                  "size": 78537442,
                  "node": {
                    "color": "#3572A5",
                    "name": "Python"
                  }
                }
              ]
            }
          },

THE UPDATED GQL QUERY LOOKS LIKE THIS

user(login: "NikhilCodes") {
    repositories(ownerAffiliations: OWNER, isFork: false, first: 100) {
      nodes {
        nameWithOwner
        isFork
        languages(first: 10, orderBy: {field: SIZE, direction: DESC}) {
          edges {
            size
            node {
              color
              name
            }
          }
        }
      }
    }
  }
tobiasvl commented 4 years ago

@tobiasvl i was experimenting with the api & the code, Are you satisfied with these stats?

tobais_lang_stat

Much better! Thank you.

However, I believe there's something wrong with your "tie-break" algorithm, so to speak. The Chrome extension linked earlier lists these as my top languages:

  1. Lua (15 repos)
  2. Python (13 repos)
  3. JavaScript (5 repos)
  4. Assembly (5 repos)
  5. HTML (5 repos)
  6. CSS (5 repos)
  7. Ruby (4 repos)
  8. C (4 repos)

And then a bunch of languages, including Java, with 1 repo each.

As you'll notice, your new code lists Assembly as my third repo and then C below that. It seems that it's simply ignoring my HTML and CSS repos, because I have as many of them as I have assembly repos! That would also explain why Java is on the list at all even though I only have one Java repo – it's ignoring my 4 Ruby repos because I already have 4 C repos on the list.

So, to sum it up, I think your patched code is getting there, but if there are several languages with the same repo count, it only displays one of them and ignores the rest.

anuraghazra commented 4 years ago

@tobiasvl It is not about how many repos you have, you could have 100 Js repos with 10bytes of code and you can have 1 Python repo with 20000bytes of code, Python would be at top in this case.

So, to sum it up, I think your patched code is getting there, but if there are several languages with the same repo count, it only displays one of them and ignores the rest.

As you can see i changed the gql query to fetch 10 languages in every repo & i'm calculating all of them.

tobiasvl commented 4 years ago

Hmm, OK, thanks. If it's intentional then I'm definitely fine with this. I don't want HTML and CSS on my list anyway 😅

anuraghazra commented 4 years ago

Also @tobiasvl

I believe there's something wrong with your "tie-break" algorithm

Actually there is no special algorithm in play here, i'm just sorting/manipulating the data coming from Github's API and picking the Top languages which have the most size.

tobiasvl commented 4 years ago

I see, thanks. Ship it! :shipit:

anuraghazra commented 4 years ago

Oh my godh. I just realized this, why even everyone was comparing the stats with http://ionicabizau.github.io/github-profile-languages/ ?????

because i just checked their source code and they are calculating "How MANY languages a user has in their profile" & github-readme-stats is calculating "MOST used languages in user's profile"

And they have a package called gh-polygot which is doing this :- https://github.com/IonicaBizau/node-gh-polyglot/blob/master/lib/index.js#L102-L106

VictorNS69 commented 4 years ago

Hi! First of all, thanks for the job you are doing with this repo!

I have a question, that I think fits here.

I have like 15 Python repositories, but I have no stats in "most used languages" Here you can see the live card:

Also the static card: image

Maybe Python is not my "Top language with the most size", but seems strange to not have any %.

anuraghazra commented 4 years ago

Maybe Python is not my "Top language with the most size", but seems strange to not have any %

@VictorNS69 You have so much code in java & ASP that python did not made it to the list.

tobiasvl commented 4 years ago

Yes, I think it's pertinent to point out that strangely, the percentages seem to be within the top 5. You don't have 1.69% TeX in all your repos, but it makes up 1.69% of all the code within your top 5 languages. (Unless I've misunderstood.)

anuraghazra commented 4 years ago

Yes, I think it's pertinent to point out that strangely, the percentages seem to be within the top 5. You don't have 1.69% TeX in all your repos, but it makes up 1.69% of all the code within your top 5 languages. (Unless I've misunderstood.)

He has one repo with a good amount of TeX https://github.com/VictorNS69/Apuntes-Ciber

abdullasirajudeen commented 4 years ago

I my Profile Not Show Top Language, All are Empty Github Stat Link

anuraghazra commented 4 years ago

@abdullasirajudeen because you don't have anything, you have 5 repos 4 of them are forks which won't be counted and one is your readme repo which does not have any code.