badges / shields

Concise, consistent, and legible badges in SVG and raster format
https://shields.io
Creative Commons Zero v1.0 Universal
23.42k stars 5.49k forks source link

Add ability to format bytes as metric (e.g: kilobyte) or IEC (e.g: kibibyte) format #10437

Open chris48s opened 1 month ago

chris48s commented 1 month ago

:clipboard: Description

Shields has a number of size badges. In general, there's a couple of patterns for how these work:

  1. In some cases, the upstream data source reports an already formatted number. So for example with the spiget badge, the API returns an already formatted size and units.
curl "https://api.spiget.org/v2/resources/771" -s | jq '.file.size, .file.sizeUnit'

That's what goes on the badge. In these situations the number on the badge is always going to match what is shown on the upstream service.

  1. The other pattern is that the upstream API reports a raw number of bytes and we format it. So for example with the bundlephobia badge the API returns a raw number of bytes.
curl "https://bundlephobia.com/api/size?package=react" -s | jq .size

In the second case, we always format the raw number of bytes for display using pretty-bytes, which exclusively uses metric (e.g: kilobyte) units.

As a general principle, we want shields to be "consistent". This means more than one thing. It can involve maintaining consistency both "horizontally" across our suite of badges (e.g: the NPM license badge should work the same way as the PyPI license badge) and "vertically" with the upstream data sources (i.e: if the registry says the latest version of your package is v2.4.1, so should we).

Always using pretty-bytes achieves horizontal consistency, but can lead to vertical inconsistency. If the upstram data provider uses IEC (e.g: kibibyte) units, the size we report on the badge may be slightly different. For example, the bundlephobia badge "disagrees" with the bundlephobia website.

In my view, vertical consistency is more important in this case.

For badges where we format a raw number of bytes, I think we should switch from pretty-bytes to another formatting library like byte-size which allows formatting using both metric and IEC units.

We should review each of the size badges where we format a raw number of bytes instead of receiving an already formatted number. There are not a huge number of these. Where the upstream data provider uses IEC units, we should switch to using that.

We could also consider exposing a param allowing the user to specify the formatting, but I think matching the upstream by default is the most important thing to do here.

This is something that has been discussed several times before (albeit in less detail):

calebcartwright commented 1 month ago

not to start a giant yak shave here but a minor point of frustration for me over the years has been the lack of a consistent, holistic, and well defined posture on what we mean by consistency.

as such, i really like the vertical vs. horizontal framing, but i also feel like determining which to prioritize in cases has been ad-hoc and temporal historically; i.e. that it's largely been driven off the in-the-moment thinking of a given maintainer at a particular point in time, and something that changes later on.

e.g. in this particular case, let's imagine i've got multiple size badges in the same readme/experience context, let's assume we've prioritized vertical consistency and there's a mix of kilo/kibi formatted numbers on those badges beside each other

and i think the conversation to be had is how we weigh that horizontal inconsistency over that vertical consistency, and then furthermore how this fits in with general mission statements to the effect of "we're not trying to match the upstream provider's look/feel" on other aspects of the badge

chris48s commented 1 month ago

I don't have a completely clear answer for this, but I feel like a reasonable starting point for a rule of thumb is: The text or value on the badge should be consistent with the upstream service (vertical consistency). The styling and formatting should be the same across badges (horizontal consistency).

So following that logic, we shouldn't tell you your code coverage is 92% if your coverage tool says it is 93%. We shouldn't say the latest version of your package is 2.0.0-beta3 if the registry says it is 1.24.1. Simultaneously, it is valid for us to say "90% coverage is green" regardless of whether that 90% came from coveralls or codecov, or to say "pre-releases are orange, stable releases are blue" even if some platforms relate colours to versions in some other way.

You may be able to provide a counter-example where we currently violate that principle one way or another.

I think size specifically sits in a kind of grey area where you could make a reasonable argument that the way we format a raw number of bytes is a value/text concern or a format/styling concern. I think I'd land on value/text though.

One extra thing I will point out in this particular case is that given there are some size badges where we present a value formatted by the upstream service we can't necessarily achieve complete horizontal consistency here. Even if we always use metric when we get a raw number of bytes, there may still be some badges where we are displaying IEC because that's what the API give us.

calebcartwright commented 1 month ago

You may be able to provide a counter-example where we currently violate that principle one way or another.

Yeah I think there are a few, which is the thrust behind why I bring this up in the first place. For clarity, I'm fully supportive of providing a user-facing ability to control the size formatting, but I'd like to get to a better place on the holistic vertical vs. horizontal framing before we go down the path of changing any defaults.

Some examples that come to mind of cases where we've opted for horizontal consistency over vertical consistency are things like where we add prefixes/suffixes (e.g. v on version badges), code coverage using different levels of precision (e.g. 92.51% vs 93%), download counts, and pipeline status wording

chris48s commented 1 month ago

Yeah all good examples - thanks.

I feel like always adding v on version badges and decimal precision/rounding are stylistic/presentation concerns. They don't change the meaning of the value. Similarly with metric(). Maybe the upstream says 32,204 downloads whereas we present that as 32k downloads, or whatever.

Changing the labels on build status (e.g: the upstream says "success" and we change it to "passing" or whatever) feels like we're being more editorial there and potentially changing the meaning. It is probably the case where we've gone furthest down the road of changing the actual values to pursue horizontal over vertical. Always been a bit on the fence about that tbh.

I'm still not quite sure how we boil it down into a rule.

I think also if we say that changing the precision of a number (e.g: from 92.51% to 93% or from 32,204 to 32k) is a stylistic concern, then that also lends weight to the argument that choosing to format as metric or IEC bytes is also a stylistic concern.

@PyvesB - I wonder if you have any thoughts on this issue. Perhaps another cook will help make sense of this broth :man_cook:

calebcartwright commented 1 month ago

They don't change the meaning of the value

I think that's fair, but I'd also posit that we aren't changing the meaning/size by using one format over the other either; the number of bytes is ultimately the same

chris48s commented 1 month ago

I'd also posit that we aren't changing the meaning/size by using one format over the other

Yeah. This is what I was getting at when I said

If we say that changing the precision of a number (e.g: from 92.51% to 93% or from 32,204 to 32k) is a stylistic concern, then that also lends weight to the argument that choosing to format as metric or IEC bytes is also a stylistic concern

PyvesB commented 4 weeks ago

I like the vertical vs. horizontal framing, and the idea of a "don't change the meaning of the value" rule.

I'd also posit that we aren't changing the meaning/size by using one format over the other either; the number of bytes is ultimately the same

In this case, I'd argue that we are partly changing the meaning in the sense that the understanding of whomever is viewing the badge may be different. Some people or some domains will picture a kilobyte as 1024. The fact that we report 6.6 kB whereas upstream reports 6.4kB is confusing and open to multiple interpretations. It feels to me that we'd be changing the meaning of the value in this case.

chris48s commented 3 weeks ago

In this case, I'd argue that we are partly changing the meaning

I think I'm inclined to agree. I think we do also have to take into account user expectations.

Another thing I've done is I've had a look at what format the upstream service uses for all the badges where we currently format a number of bytes (I've ignored the ones where we just present an already formatted string from the API). Here's what I found:

service format
Crates.io IEC
Bundlephobia IEC
GitHub IEC
Steam metric
Docker IEC
NPM metric
Visual Studio App Center unknown

I think my proposal is we:

For Visual Studio App Center, I reckon we just leave it as is for now if we can't find out. I can't seem to see an example of a public project on there.

PyvesB commented 3 weeks ago

Allow changing the format with a url param (the user can change it if they want)

Are we really helping any form of consistency by adding this possibility? I feel that it may be confusing for users to see the same badge on various repos, but with different meanings.

calebcartwright commented 3 weeks ago

Hate to say it but I strongly disagree :grimacing:

Perhaps I'm being overly pedantic but I think there's a fundamental difference between actually changing the meaning versus there being an opportunity for users to get confused or misunderstand.

From my perspective changing the meaning would be something like receiving a temperature of 0 degrees celsius but displaying it explicitly as 0 degrees farenheit or kelvin.

I also don't think it's our job to address the cognitive gaps some users have relative to the long established question of "how many bytes are actually in X", and if we're going to open that door then there's probably several others that get opened automatically in doing so.

I'm still stuck with the same hurdles I enumerated earlier:

I don't think we've consistently applied the same rules/logic in determining whether to prioritize vertical or horizontal consistency across the project, and I think we should, or at a minimum agree on a general policy/procedure in how those should be determined.

Reiterating some of my previous examples, we very much prioritized horizontal consistency for pipeline badge status messages, and changing words (with the associated connotations and ambiguities) is much more likely to change meaning, and almost certainly creates surface for there to be a discrepancy between the message on a Shields-provided badge and the corresponding value where it's displayed (e.g. a pipeline status badge on a readme in the repo of a source control platform where those pipelines run).

We also take a lot of liberties with download counts in the spirit of horizontal consistency, at the expense of vertical consistency and which contradicts numbers reported in package management/distribution systems. It's something we've had users report and complain about before, but which we dismissed for reasons centered around horizontal consistency.

If a user in our community asked me about those cases referenced against vertical consistency for size badges, I would not be able to give them an answer.

I think we do also have to take into account user expectations.

I'd agree user expectations should be a factor, but I'd also ask whether we feel confident we know what the general user expectation is? I.e. we've heard from a handful of people who have reached out because current behavior doesn't align with their own expectations, but are we sure the expectations of those few accurately represent the broader user base?


Again I don't actually have a strong opinion one way or the other about whether we should prioritize horizontal vs. vertical consistency on the size badges. However, unless I'm just being dense or missing something, I feel like the reasonings put forward to prioritize vertical in this case are contradictory (or at least have incompatibilities) with the other cases where we've prioritized horizontal consistency

chris48s commented 3 weeks ago

we very much prioritized horizontal consistency for pipeline badge status messages, and changing words (with the associated connotations and ambiguities) is much more likely to change meaning

Agree with this. Build status badges are definitely a place where we are being quite editorial.

We also take a lot of liberties with download counts in the spirit of horizontal consistency, at the expense of vertical consistency and which contradicts numbers reported in package management/distribution systems. It's something we've had users report and complain about before, but which we dismissed for reasons centered around horizontal consistency

On this one, are you just talking about rounding, or are there other things here? If so, can you give an example. I'm not sure I'm with you on this one.

calebcartwright commented 2 weeks ago

On this one, are you just talking about rounding, or are there other things here? If so, can you give an example. I'm not sure I'm with you on this one.

Yes I'm referring to the rounding. Our editorial decision to round was done in the name of horizontal consistency, e.g.

https://github.com/badges/shields/issues/742

These seem to work against the spirit of the badge design, whose goal is to provide a consistent, clean, and understandable experience across various services and projects. It makes it harder for someone looking at the readme to understand.

A theme which has been repeated many times over the years, albeit not always consistently.

I share this example with download counts because in my opinion it includes the same same characteristics that we've been discussing here:

  1. The download/star/etc. counts don't match (some) user expectations; we've had lots of complaints/requests over the years, some of whom have made the case that the badge essentially changes the meaning/misrepresents the download/star counts
  2. The badge values differ from one of the platforms (e.g. package registry) where they are displayed (e.g. some registries display exact values, not the rounded number Shields decides to put on the badges in the name of horizontal consistency)

And I'm saying I don't see how that's any different than what's been discussed on this thread wrt size badges.

So I come back to us needing to be able to provide a cohesive and consistent explanation for why we'd do horizontal in some cases and vertical in others.

calebcartwright commented 2 weeks ago

I'll also add that I'm just digging in on my belief that we need to agree to a model or procedure for how we want to handle this horizontal vs vertical as a project.

It's fine if whatever model we decide makes sense in 2024 is different than it was at the start of the project, and it's fine if there's existing categories of badges that don't currently match that model (we could always review and identify outliers and determine whether to make changes or to leave them as-is, but noted explicitly as an exception to that model that we decided to maintain)

I think a project that so heavily emphasizes consistency should be consistent in how and where it applies consistency :grin:

chris48s commented 2 weeks ago

OK thanks for clarifying. I probably won't have a chance to come back to this for a few days, but I will reply when I get a chance.

chris48s commented 1 week ago

Sorry it has taken a while to reply on this. I think the process of reflecting on it has been helpful to tease concepts apart a bit though.

I think there is an important difference between the concepts we're comparing here: Precision and Units.

With download counts, the upstream sometimes presents a precise count and we round it. So for example:

..and so on, but we're still fundamentally talking about "number of downloads" (the units)

If we're going to say "lets treat download counts, coverage percentages, and file sizes consistently", the analogous concept would be the precision. So maybe we say "we always round file sizes to the nearest whole number across all services". Then the upstream might say "771.3 Kib" and we round to "771 Kib", or even the upstream might choose to present that as "0.75 Mib" and we present that as "771 Kib". That's a stylistic concern, but still within the upstream's convention for using the Metric or IEC system of units. That would be a decision that aligns the comparable concept across downloads, coverage and sizes IMO.

Here is a starting point suggestion for a more concrete "rule" we could write down and apply here to express that concept:

When presenting numbers: We prioritise horizontal consistency when it comes to the precision. We prioritise vertical consistency when it comes to the system of units

So a contrived example of this might be: If we had a badge that shows the weight of a thing or a distance, we match the upstream's convention when it comes to metric vs imperial units.

Some better shields examples might be:

A lot of our badges, we're really talking about "number of downloads", "number of likes", "number of forks", etc so there isn't really any wiggle room on the units there. It is just a count. For the sake of argument: If we were to say that is our rule/position, can you think of any counter-examples of badges we currently have where we violate that principle? Can you suggest anything we would have to review or change?

I guess another question here is: Do we think that is a good/useful rule?

In terms of being more general than that, I don't think we can define a single "grand unified theory of everything" which implies both how we present a file size and also how we present a software license, for example. They just have different considerations. There's an extent to which we will need to consider things by categories.

calebcartwright commented 1 week ago

Here is a starting point suggestion for a more concrete "rule" we could write down and apply here to express that concept:

I guess another question here is: Do we think that is a good/useful rule?

Yes, I think that'd be a reasonable rule and is the type of articulation I'm saying that I think we need

I don't think we can define a single "grand unified theory of everything"

I'm not proposing anything so grandiose, and I don't think that what I've been suggesting will entail a level of complexity that borders on the unification of gravity and quantum mechanics.

The rule you've suggested covers "what"/"how" we do it, but I still feel like we're missing a "why", and the thrust of my concern is that I don't think we have any consistency in our "why"

why do we want to prioritize vertical consistency for numerical units, while simultaneously prioritizing horizontal consistency for pipeline status badges?

and again, yes I fully recognize that we're talking about numbers in one camp in words in another, but if the reasoning for prioritizing vertical consistency for numerical units is based on consistency with upstream platforms and user expectations, then why do we ignore/deprioritize those same factors in pipeline status badges where we prioritize horizontal consistency?

more generally, why is matching upstream service display weighed so heavily for numerical units, but completely irrelevant in other cases?

calebcartwright commented 1 week ago

expanding on this, i think the "why" for standardizing precision and having horizontal consistency there is because a core part of our ethos is badges being clean, clear, & concise

e.g.

is objectively, i'd argue, far more clean, clear, and concise than

the package registries that display download counts have full control over their respective UIs and have the flexibility to choose a precision that works best there, whereas our badges can and do get used in multiple places and need to match our goals everywhere.

that makes sense to me, and feels like a logically sound reasoning (the "why") behind horizontal consistency for precision. it also has the added effect that in a situation where multiple instances of our badges could be used side by side (e.g. a monorepo that's pushing multiple packages to different registries) and also be consistent with each other

chris48s commented 1 week ago

So the principle we've documented and on some level agreed that supports this would be:

https://github.com/badges/shields/blob/026b45e07d0d5720fe57f5c8015ef6a01bfd7fa9/doc/input-validation.md?plain=1#L31

Upstreams like registries on some level set community norms. In a situation where there is ambiguity or different conventions in use, going with what the upstream does is often a good call. Particularly where it affects meaning as opposed to stylistic concerns. Tbh, I'd say it is what we often default to doing just by virtue of the fact that the data we have to work with is what the upstream chooses to provide via an API in the form they choose to provide it.

This specific case is an unusual sitation because it is an area where the upstreams can be "opinionated" one way or the other (metric/IEC) in user-facing contexts but then they tend to (not always, but frequently) expose the data in an "unopionated" way (bytes) via the API. The more I think about this, the more I'm convinced that this is quite a unique situation that is hard to map onto anything else. I can't really think of anything else we deal with which has this property that there are two (or more) different systems in common use stemming from the same basic building block (in this case, bytes). Anything else where there are units in play, you'd have to go out of your way to explicitly convert them, or we're dealing with something where there is no wiggle room on the units like "number of stars". So I think in that sense this particular data point has quite a unique property and that is why it is difficult to map it exactly on to another concept we deal with.

With the file sizes, I think this has led us to kind of sleep-walking to the place we're in now. 7 years ago a drive-by contributor installed a library that only does metric bytes without much thought about that (there is no discussion of which library to use or which system of units in that PR). Then we just kinda used it because it was already installed. Then we retrofitted "consistency" as a justification for not changing things later. I'm not sure it is a result of someone having really thought this through and made a conscious decision that we always use metric bytes in the name of consistency.


why do we want to prioritize vertical consistency for numerical units, while simultaneously prioritizing horizontal consistency for pipeline status badges?

Just to be clear on this point: I'm not trying to square these things. If the outcome of this is we decide we should also stop changing the words on build badges, I'd be :+1: on that.

As you correctly say in an earlier post

changing words (with the associated connotations and ambiguities) is much more likely to change meaning, and almost certainly creates surface for there to be a discrepancy between the message on a Shields-provided badge and the corresponding value where it's displayed (e.g. a pipeline status badge on a readme in the repo of a source control platform where those pipelines run)

I agree and that's a good reason to look back and question this decision too IMO.


In general, obviously there is some need for some level of both horizontal and vertical consistency and arguments for both. Taking it to either extreme is probably going to be unhelpful and I think in most cases we're striking a reasonable balance but there are obviously grey areas. In some ways it is easier to say something like "we prioritise horizontal consistency above all else" or "we always prioritise vertical consistency" because that's a straightforward statement to write down, but the lack of nuance in adopting one of those extreme positions would ultimately make the project worse.

chris48s commented 1 day ago

A point that came out of the work I was doing on #9916 reviewing all the colour scales we use which is way off the original topic of metric vs IEC bytes, but very relevant to the topic of horizontal consistency

There are places where we're defining, for example, a 3 point good --> bad scale that is brightgreen/yellow/red and other places where we're defining a 3 point good --> bad scale that is brightgreen/orange/red and there is not really any good reason why they are different, and there are similar variations for 4 and 5-point scales. That feels like something that is not horizontally consistent but should be. There's definitely room to standardise that stuff way more, and colours are an area where I think there is full agreement that this is formatting/display and should be internally consistent.