github-linguist / linguist

Language Savant. If your repository's language is being reported incorrectly, send us a pull request!
MIT License
12.12k stars 4.2k forks source link

FYI: Temporary change in language and extension popularity assessment #5756

Open lildude opened 2 years ago

lildude commented 2 years ago

GitHub's Search is struggling at the moment so all Search requests are being heavily restricted making it almost impossible to count the number of unique :user/:repo combinations via the likes of Harvester or the API.

Search is in the process of being rewritten with the Tech Preview available at https://cs.github.com/ (please tinker with it and send GitHub feedback) however it isn't accessible via the API yet and doesn't quite yet meet our needs to determine our current usage requirements so for the foreseeable future I'll be using my judgment to determine popularity until the new Search gains the functionality we need and/or the restrictions are lifted (or we can come up with other qualifying criteria).

I know this is subjective and open to debate so the loose rules I'll be using are along the lines of:

If particular users are showing a high proportion of the results, I'll manually filter out those users using -user:<username> to reduce their impact on my assessment.

I know this isn't ideal, but I think it's the best option for the moment. I'm open to suggestions too. On the plus side, it does mean a lot more PRs are likely to be merged 😁.

I'll be going back through older PRs in the next week or two and will re-assess based on these notes and merging any that satisfy them.

Alhadis commented 2 years ago

Search is in the process of being rewritten

Might be a good time to request a "search by extension/filename" feature to simplify the task of adding new languages to GitHub... 😉

lildude commented 2 years ago

Might be a good time to request a "search by extension/filename" feature to simplify the task of adding new languages to GitHub... 😉

I think we're covered already by a combination of scopes and more intuitive file path expressions:

CleanShot 2022-02-03 at 09 26 58

Alhadis commented 2 years ago

Wow, regular expressions will be supported? Now we're talking. 😀

Also, I tried to access https://cs.github.com/ but it simply redirected me to my activity feed (i.e., https://github.com/). Is it staff-only or something?

lildude commented 2 years ago

Nope. You need to be invited. Join the waitlist at https://cs.github.com/about

Alhadis commented 2 years ago

Done. Hopefully this'll make Harvester's rewrite less intimidating. 😅

elimisteve commented 2 years ago

On the plus side, it does mean a lot more PRs are likely to be merged 😁.

That's what really matters anyway -- yay! 🎉

runarorama commented 1 year ago

I want to submit support for Unison (https://unison-lang.org), but there are zero GitHub repositories with Unison code in them since Unison is an image-based language and can't really use Git and doesn't have source files. I estimate that Unison has roughly 2000 users total.

Is it worth trying to submit?

lildude commented 1 year ago

but there are zero GitHub repositories with Unison code in them since Unison is an image-based language and can't really use Git and doesn't have source files.

@runarorama If what you say is true, you've answered your own question... how and why add support for a language that can't actually be found, analysed or even viewed on GitHub? 🤔

As an aside, this question really should have been a new discussion.

mawildoer commented 6 months ago

One perhaps stupid question (I'm sorry if I missed this!) but how should we (as a language's creators) find the ~5k lines of code/200 repos?

I believe we're getting to the right point (based on the telemetry we do have), but we're finding it really hard to turn the repos up based on keywords.

I also blindly attempted to see if I could declare a language that Github would process somewhat generically for the sake of marking repos to no avail. https://github.com/atopile/spin-servo-drive/blob/main/.gitattributes

lildude commented 6 months ago

One perhaps stupid question (I'm sorry if I missed this!) but how should we (as a language's creators) find the ~5k lines of code/200 repos?

Use GitHub's Search. This is the only way we assess the popularity based on the search URL offered in the PR template and any further customisations you make to it. The more precise you make the query, the better. Note, we do not, and never have nor will, count lines of code.

I believe we're getting to the right point (based on the telemetry we do have), but we're finding it really hard to turn the repos up based on keywords.

The new search is pretty good now and you can use regular expressions too.

I also blindly attempted to see if I could declare a language that Github would process somewhat generically for the sake of marking repos to no avail. https://github.com/atopile/spin-servo-drive/blob/main/.gitattributes

This is expected. This has been discussed at length in other issues and will not be discussed in this issue.