badges / shields

Concise, consistent, and legible badges in SVG and raster format
https://shields.io
Creative Commons Zero v1.0 Universal
23.92k stars 5.51k forks source link

GitHub Occurrences Badge #4068

Open cloewen8 opened 5 years ago

cloewen8 commented 5 years ago

:clipboard: Description

A badge for GitHub that counts the occurrences of a sequence in a file.

For example, given /github/occurances/badges/shields/README.md/badge, the badge would show 22 (badge occurs in README.md 22 times).

Optimally, sequence should be a regular expression (escape sequences, word boundaries, character classes).

:link: Data

The data required for this can be retrieved from https://raw.githubusercontent.com/:user/:repo/:branch/:path. It only requires authentication for private repositories. Unfortunately I don't know of any official documentation for this endpoint, only that it is the destination when pressing "Raw" on a file on GitHub.

Additional processing is required and would need to be limited.

:microphone: Motivation

I personally want to use it to count the number of facts in a text file (each fact is on its own line). A badge for counting lines would work, but being able to count anything opens the door for a lot more opportunity:

paulmelnikow commented 5 years ago

Hi, thanks for your request. We have something similar which searches for files within a repo that match a specific pattern, using the GitHub Search API, however it’s not able to do this.

I like the idea of a dynamic text badge, and can see doing lines or string matching, however I feel like for a lot of things you’d want a regex (and not sure we should run arbitrary regexes, since they can be crafted in a way that they use a large amount of compute resource).

Can you share a link to the file? Sometimes seeing the specific case really solidifies why a feature should exist. It might also surface a creative way to use what is already there!

cloewen8 commented 5 years ago

I absolutely agree that arbitrary regex (or any user submitted code) should not be blindly trusted! For computation, a timeout can be used. I know regex101.com uses this strategy. image

Here is were I want to use it: https://github.com/cloewen8/dolphin-fact/blob/master/README.md Currently, I'm using /github/size, but this isn't very helpful, but better than nothing. I want to use it as a form of progress counter, get people interested as the project grows.

paulmelnikow commented 5 years ago

Huh, since it's a list, what would you think about using YAML instead? That way you could use the Dynamic YAML badge and a JSONPath expression.

All you'd have to do is prefix each line with a -, and then in your app, you could either strip off the leading - or a proper YAML parser (we use js-yaml which is great).

cloewen8 commented 5 years ago

YAML is definitely an option for my use case. I wouldn't consider it optimal over a simple text file or CSV file though. Would creating this or a similar badge be an option? Depending on what is required, I would be willing to just create it.

paulmelnikow commented 5 years ago

I'd be 👍 on adding a badge to count lines either in an arbitrary URL or in a file on GitHub. Would you be interested in working on that?

The GitHub version is a little more complicated because, to support auth, we use the Contents API.

This is the helper function that fetches file contents from GitHub repos. It parses JSON but could be refactored to obtain the contents as text.

https://github.com/badges/shields/blob/90f8ffce9e8340c2444cfc473531887256ebe568/services/github/github-common-fetch.js#L27-L60

Here's the existing GitHub badge for the package.json version, which is the closest badge we have to GitHub file line count.

https://github.com/badges/shields/blob/90f8ffce9e8340c2444cfc473531887256ebe568/services/github/github-package-json.service.js#L24-L79

The "any URL" version (which could be used with a raw.githubusercontent.com URL) would be simpler. The osslifecycle badge could be adapted pretty readily for this:

https://github.com/badges/shields/blob/90f8ffce9e8340c2444cfc473531887256ebe568/services/osslifecycle/osslifecycle.service.js#L1-L107

And here's our tutorial: https://github.com/badges/shields/blob/master/doc/TUTORIAL.md

calebcartwright commented 2 years ago

Confess I'm still not sure I'm following after reading through a couple times, but curious whether this is a case that would be better suited to our Endpoint Badge?

cloewen8 commented 2 years ago

I feel like this is general-purpose enough to be its own badge. That certainly is an option, but would require hosting the endpoint, which may be too much setup for users.

Would you be interested in working on that? Assuming this is still a wanted feature, I could definitely implement it now.

calebcartwright commented 2 years ago

but would require hosting the endpoint, which may be too much setup for users.

This is a common and understandable intuition, but one which I tend to think is an incorrect assumption. With services like Runkit (linked in our Endpoint docs) there's 0 hosting concerns and 0 costs, users can quite literally just chuck a bit of code up there and be off and running.

While there's certainly a case to be made that your goal is something others might be interested in (though worth mentioning that we've not had any other requests nor has our community been upvoting/requesting this particular ask), I'm not convinced any implementation would actually be sufficiently general purpose. I also have some reservations about the notion of processing any arbitrary file on our prod servers as an implementation mechanism to achieve the goal.

I'm not inherently opposed to having this as a native badge, so I'd be happy to have my skepticism and concerns proven wrong if you're feeling sufficiently motivated to submit a PR! However, I do think the Endpoint is both the easiest and fastest approach, and is also something we could reference in places like Awesome Badges to highlight the pattern in case any future users are interested in something similar.

cloewen8 commented 2 years ago

I no longer need this, I forgot why I needed this to begin with. My motivation in implementing it is the simplicity and initial responses. If it would be more trouble than it's worth, I'd be happy to use the Endpoint badge with RunKit instead when needed (can't believe I ever missed this, great service).