avoidwork / filesize.js

JavaScript library to generate a human readable String describing the file size
https://filesizejs.com
BSD 3-Clause "New" or "Revised" License
1.61k stars 97 forks source link

SI standard should be default #137

Closed getsnoopy closed 3 years ago

getsnoopy commented 3 years ago

I noticed that the project defaults to base-2 calculations using the JEDEC "standard", which is arguably the worst of both worlds.

Firstly and most importantly, 96% of the world (and science within the US) uses the metric system (SI) regularly, where the prefixes kilo-, mega-, etc. all have universal and precise meanings: thousand, million, etc. As such, most of the world understands what these prefixes mean and would expect them to mean the same thing when it comes to bits and bytes, and rightly so: because that is indeed what they mean. JEDEC, however, calls for the unit names and symbols to be represented as if they are SI-based, but actually defines them differently (e.g., kilobyte and KB mean 1024 bytes, etc.). This is deceptive, and has been the basis for much confusion and even lawsuits.

To fix this once and for all, the IEC has come up with different names and symbols for the base-2 units. Most operating systems, wisely, switched everything over to SI units. In fact, about 70% of the devices of the world use SI prefixes for everything but RAM by default; Windows is the only unfortunate holdout. Since most people in the world are familiar with the SI and are not in the esoteric field of memory in computer science, they expect SI units. So it's a bit ironic to refer to the package as a way "to get a human readable file size" when it only caters to 30% of the humans of the world.

To add to the confusion, regardless of JEDEC or SI, most contexts where bits would be used are related to networking, where the prefixes mean the SI definition (base 10) almost always anyway! This can be considered a bug at the moment where the package calculates bits using base-2 by default as well, which is inappropriate in most use cases.

Please change the default to base 10 and remove JEDEC completely; if people want to use base-2, they have the option to do so with the IEC standard, but it at least wouldn't lie to them about what they're seeing. I understand that this would be backwards-incompatible, so bumping to another major version is to be expected, but I think it's important and worth it.

avoidwork commented 3 years ago

2.0.0 was SI in 2013.

avoidwork commented 3 years ago

My primary use case is memory, so it works just great for me.

getsnoopy commented 3 years ago

Sure, but why not default to IEC at least? JEDEC is just misleading.

avoidwork commented 3 years ago

JEDEC is not misleading; memory and files on disk are still base 2. Marketing is base 10, to make it easier for humans to communicate general statements about "things".

IEC came later in the code than the JEDEC units; that's the only reason.

getsnoopy commented 3 years ago

Files on disk are, but most people do not read or report file sizes on disk; they report actual file sizes, which are base 10.

Regardless, my point is not that base 2 is not common, but that JEDEC uses base 10 names and symbols for base 2, which is misleading. "MB" means megabyte to anyone reading it, but defining it as 1024² bytes doesn't serve anyone. "MiB", on the other hand, makes it clear what is meant. So if you prefer base 2 units as the default, why not prefer ones that aren't misleading?

avoidwork commented 3 years ago

JEDEC units are not base10 units, see https://en.wikipedia.org/wiki/JEDEC_memory_standards#Units_of_information

Base 10 (decimal) is expressed with a lowercase 'k' for kilo, anything that's base2 is uppercase 'K'. See https://en.wikipedia.org/wiki/Kilobit

This is why I have no interest in changing the default, virtually no one knows what's what.

getsnoopy commented 3 years ago

JEDEC units are not base10 units

That's exactly my point. JEDEC uses base 10 (i.e. SI) unit names and symbols to refer to base 2 quantities, which is misleading. This is excepting kilo- for its unit symbol, for which it uses uppercase k, but very few people know that the symbol for prefix kilo- is meant to only be lowercase, so an uppercase k indicates a binary interpretation of the prefix. You can see this when people write "Kg" or "KG" for kilogram, for example. For all other units, JEDEC uses the same symbols as SI.

This doesn't change the fact that JEDEC keeps the unit names the same: KB for "kilobyte" (actually 1024 bytes, which is not a kilobyte), MB for "megabyte" (actually 1 048 576 bytes, which is not a megabyte), etc. Hence, my point about JEDEC simply being misleading.

virtually no one knows what's what

This is why I linked to the stats that show that 70% of the devices of the world have operating systems that show the proper SI units to their users, which is notwithstanding that almost all non-technical people in the world consistently expect SI (read: proper) interpretations for the unit names.

avoidwork commented 3 years ago

Now I understand where you're coming from; I have the same response when I hear "decimate" used incorrectly due to MCU.

I reopened the ticket; if you want, make a PR to change it to IEC as default, or SI. I don't have a strong opinion, I'm interested in the least issue drama from such a change. I'd wager IEC is the change that people will accept.

getsnoopy commented 3 years ago

Sounds good. I will create one that will change the default to SI, since that's the most common (Node.js is not used for many RAM applications AFAIK). But that PR would depend on my other bit PR being merged, so could you please reopen that?

avoidwork commented 3 years ago

Re-opened; on sabbatical atm so I'll be slow to respond.

avoidwork commented 3 years ago

Released as 8.0.0