Open benaadams opened 7 years ago
I don't suppose a BitSize
would make sense?
If we did the proposal, it should include also the kibibyte, mebi, gibibyte, etc. series. These are very commonly needed in networking, SSD/HDD storage, and other bits.
Bonus: which is used varies on storage by platform (e.g. macOS uses gigibyte, Windows uses gigabyte).
There is ambigiuity between Kilobyte as 1000 bytes and 1024 bytes (though Kibibyte does exist as @NickCraver points out)
Would defining both types differentiate this enough?
Does-it include Qubit / KiloQubit ... too ? :sweat_smile:
What's the Format
value difference? (I assume just lack of domain knowledge from my side)
If you add the the `Kibi' variants, we can run it by API review in 1h or next week.
Do you have a namespace in mind? System.Numerics, System, anything else?
As @karelz pointed - these probably should be Kibi (1024) not Kilo (1000) prefixes. I believe kilo used to be 1024 but it was causing too much confusion
API review - early comments:
Just being devil's advocate. Have you considered fractional bytes, as in 55.1 bytes/sec or 700.3 bytes/file?
Not that I am against the pattern or anything, but "typed unit structures" are a fairly alien concept to the BCL. There are many such places where this kind of abstraction could be good from a usability perspective, but it's not something we usually do. This also won't mix well with the many call sites in our libraries which use int
s, long
s, IntPtr
s etc. to represent data sizes in byte units.
Have you considered fractional bytes, as in 55.1 bytes/sec or 700.3 bytes/file
Those are compound units, i.e. BytesPerSecond
.
I also believe that ByteSize is not a best name as byte size is simply 8 bits. Better name would be IMO DataSize or DataSizeUnitsConverter (or something along these lines).
The Format enum - I might be missing some context - perhaps it is being more widely used and I'm not aware of that but I'm seeing this for the first time and it its intention is not clear to me - I believe it should be something more verbose and more self describing.
Parse, TryParse, TryParseExact - is this for translating something like "10MB" to bytes or perhaps "10 MB" or maybe "10 megabytes" or maybe "10 megabytes and 20 kilobytes" - will this work with localized strings? IMO this is next level of abstraction and should be extracted from this review as I believe it opens too many questions and ambiguity.
I do not enjoy idea of Yottabytes being "final" unit and having different type - I'd rather change this to short and throw an Exception that higher numbers are not supported yet - this way we can extend this in case in couple of years some new technology will show up and Yottabytes hard drives will be accessible for everyone
TotalKilobytes - should this be BigInteger instead? Double is perhaps useful in many cases but feels like this should be some integer type
I also believe that ByteSize is not a best name as byte size is simply 8 bits. Better name would be IMO DataSize or DataSizeUnitsConverter (or something along these lines).
I think the whole point is to encode the units in the type. A DataSize
parameter is just as ambiguous as a ulong
parameter, at least at a glance.
The Format enum - I might be missing some context
Whether to interpret 1 KB
as decimal or binary (e.g. 1000 or 1024)
I do not enjoy idea of Yottabytes being "final" unit and having different type - I'd rather change this to short and throw an Exception that higher numbers are not supported yet
Turns out long can only support +/- 7 ExBiBytes, updating spec atm
Updated spec; notably changed to long for byte differences
This also won't mix well with the many call sites in our libraries which use ints, longs, IntPtrs etc. to represent data sizes in byte units.
@mellinoe should there be a ShortByteSize
which is an int
and a LongByteSize
which is a long
to provide this interoperablity?
ByteLength
a better name than ByteSize
?
ByteSizeF
?
What's the point of so many constructors? How will they be used?
Why did you use short
everywhere instead of long
?
The naming short KiloBytes
vs. double TotalKiloBytes
sounds weird. Is there any precedence to this pattern of having int vs. double differentiated with 'Total' prefix?
What's the point of so many constructors? How will they be used?
Removed; same can be achieved, probably more usefully via chaining as can specify units e.g.
new ByteSize(10) + ByteSize.FromKiloBytes(10).Add(ByteSize.FromMeBiBytes(10))
The naming short KiloBytes vs. double TotalKiloBytes sounds weird. Is there any precedence to this pattern of having int vs. double differentiated with 'Total' prefix?
As used in TimeSpan
e.g. Days
vs TotalDays
KiloBytes
is the modulus of its unit and can only return the range -1000
to 1000
e.g. 111,111,111 bytes
would return 111
TotalKiloBytes
is all the bytes expressed in kilobytes
e.g. 1,111,111 bytes
would return 1,111.111
KiloBytes is the modulus of its unit
Oh, I entirely missed that. I though you are providing utility function to give me KBs (even if there is millions of them - that's what 'Total&asterix;' gives me). You truly want it similar to TimeSpan
. Now the 'Total' prefix makes perfect sense.
I wonder: In TimeSpan
one expects trouble as the base is changing (1000, vs. 60 vs. 24). I wonder if this API won't be too confusing ... (or is it just me?)
Such modulus properties as TimeSpan.Hours
which can only be [-23, 23] and DateTime.Hour
which can only be [0, 23] use int
.
Mostly I think they are useful when custom formatting; though ToString
should probably handle that?
Happy to remove; will comment the block in suggested api
@JonHanna next you'll tell me people use int
rather than uint
for their loop array indexer; due to negative array sizes 😉
Updated to int
@tarekgh @mellinoe can you give feedback on delta between this and ready for review?
@karelz @mellinoe do you have any more comments or you think the proposal looks good?
Fine from my side. I think it will spark again the discussion about where to put these APIs -- the idea that's rolling around is that CoreFX might not be best place. We might need something like CoreFXExtensions (disclaimer: I just made up the name).
Not that I am against the pattern or anything, but "typed unit structures" are a fairly alien concept to the BCL. There are many such places where this kind of abstraction could be good from a usability perspective, but it's not something we usually do. This also won't mix well with the many call sites in our libraries which use ints, longs, IntPtrs etc. to represent data sizes in byte units.
^ I still think this is something to at least talk about, and I also have a few more thoughts.
Marshal.AllocHGlobal
take this as a parameter?ByteSize
but can hold negative values. What is the meaning of a negative byte size, and along with the first bullet point, how would such values be handled in the functions that are intended to use this type?It's called ByteSize but can hold negative values. What is the meaning of a negative byte size,
For diffs/deltas/not enough space: var change = fileSize1 - fileSize0;
how would such values be handled in the functions that are intended to use this type?
It may be valid, as above, if it needs to be > 0 by a test?
if (size <= ByteSize.Zero) throw new ArgumentOutOfRange(nameof(size))
API review:
Notes:
It should be struct, otherwise what is the point?
Haha, yeah - oversight 😄
As far as I'm aware, outside of disk capacity marketing, "1 kilobyte = 1000 bytes" is not used anywhere. Some programs accurately describe sizes as KiB
, but programs that say KB
generally still mean 1024 bytes. Case in point: Windows Explorer.
However you slice it, some people are going to find all those properties either ambiguous or redundant and will need to check the documentation (or hopefully, just the IntelliSense). I think this API could be made cleaner by omitting anything to do with 1000 byte multiples, and then biting the bullet and just calling the 1024 byte multiples kilo, mega, etc. even though that's not technically correct.
Additionally, I think the "modulus" properties are not useful, clutter the API surface, and lead to nonsensical behavior. In a TimeSpan
, it makes sense to say that the Hours
value is an integer between 0 and 23, because if it were more, that would count toward the next "place value" which is Days
. However, a size in bytes does not have place values. 3565158 bytes is not 3 MiB, 409 KiB, and 614 B
. I don't think anyone in history has used notation like that. 3565158 bytes is just 3.4 MiB
(or 3481.6 KiB
), and that's all the properties should expose.
(Also, what's up with the casing? KiloBytes
is weird, KiBiBytes
is worse.)
quick note, Parse functionality always a source of problems and I believe we should not repeat the same mistake. I think at least
public static TimeSpan Parse(string s);
public static ByteSize Parse(string input, IFormatProvider formatProvider);
should be removed and consumers should either use ParseExact or use TryParse/Exact.
other things, for formatting with cultures, does Windows support providing the needed unit names? (e.g. KB, GB...etc.). the framework doesn't carry the globalization data and always depends on the OS for that. so to be able to support that, we'll need to ensure we can get the needed data from the OS.
would be nice if you can add some code sample and expected results of executing such sample code.
These KiKiByte
names exist because hard disk manufacturers like to need statements in tiny letters about how big their products are. Just to save some 5, 7, 10, or 12.5% of money. Oh yeah and for Apple systems because they went with the deception.
For any programmer a Kilobyte has always been 1024 bytes, K, M, G, T in the context of bytes are powers of 1 << 10. That should be the default in any programming language. If ByteSize.FromKiloBytes(x)
will do a division by 1000 bytes imho creates confusion and makes me sad.
Shouldn't the KiKiBytes
units and names not just be removed and two types created which convert between the two instead, for example by a formatter if you do ToString()? The type names should indicate the type of KiloByte
that they define, with the prettiest, cleanest name reserved for the one that uses powers of 1024, imho.
Any update to share on this proposal ?
Like TimeSpan to remove ambiguity in sizes e.g. https://github.com/aspnet/KestrelHttpServer/issues/1247
Suggested API