carbon-language / carbon-lang

Carbon Language's main repository: documents, design, implementation, and related tools. (NOTE: Carbon Language is experimental; see README)
http://docs.carbon-lang.dev/
Other
32.25k stars 1.47k forks source link

A Modest Proposal: Disallow use of arithmetic operators on unsigned types. #2106

Open mbabinski-at-google opened 2 years ago

mbabinski-at-google commented 2 years ago

The hazards of unsigned arithmetic in C/C++ are well known, and modern style guides generally discourage their use for anything except bitwise manipulation and cases where the extra bit of positive range is actually required. I propose that:

Open questions:

zygoloid commented 2 years ago

The use of arithmetic operators on unsigned types, expecting modulo semantics, appears in contexts such as random number generation, cryptography, and hashing, and those are domains in which we expect unsigned types to be used. We need to make those operations available by some syntax, and we expect it to be highly desirable in those domains to use the standard mathematical notation. As a consequence, we very likely want to have some type that provides modular arithmetic semantics as well as bitwise operators, whether that type is called uN or something else.

We did consider various options here, for example having three different families of types: signed (with no wrapping), unsigned (with no wrapping), and modular (with no division or ordering). Unfortunately, I don't think we have a write-up of that discussion, but the concern was that only two of those three families (signed and modular) would be used in practice, and that the unsigned-but-not-wrapping type would be hazardous because its invalid states (negative numbers) are so close to commonly-used states (such as zero).

We should certainly have a proposal that examines the possible models for integer type semantics and selects which ones to use.

mbabinski-at-google commented 2 years ago

Presumably, there would be nothing stopping an efficient implementation of modular integral types from being provided as class types in a ModularArithmetic module or some such? Dropping arithmetic operations from the builtins would make it clear to users that these are types with niche use cases, and shouldn't be reached for without thought just because you think you want to restrict to non-negative values.

Also, the fact that all three problem domains you listed are security-critical suggests to me that sacrificing brevity in favor of explicitly spelling out intended semantics would be a feature, not a bug =)

L4stR1t3s commented 2 years ago

This is a two-edged sword. Yes, disallowing specific operations can make code more safe in general. But it also means that you are blocking potential clever solutions.

For example, fast inverse square root was only possible because C/C++ allowed specific type casts, even if they were defined as undefined behaviour. So at the time, a commonly used calculation was made a lot faster, because of something that at first sight is a bad idea and should not be done.

I prefer giving the developer more options, and the added responsibilities that come with it.

I would also like to add that unsigned integers are often used as the size type for containers. Arithmetic operations are required if that would be the case for Carbon.

geoffromer commented 2 years ago

I would also like to add that unsigned integers are often used as the size type for containers. Arithmetic operations are required if that would be the case for Carbon.

I would be quite surprised if Carbon's standard containers had unsigned size types. In that use case modular arithmetic is rarely if ever useful, and it creates bugs and (IIUC) blocks loop optimizations. On 32 bit platforms the extra bit arguably has some benefit for some use cases, but I think not enough to justify the costs for everyone else.

OlaFosheimGrostad commented 2 years ago

Please consider modular and clamping operators instead of tying modular arithmetics to a base-2 type (you may want something else, like a prime). Operations on scalars ought to be compatible with operations on SIMD types.

Take a look at #1977 when evaluating what to do with unsigned types and modular expressions.

mbabinski-at-google commented 2 years ago

This is a two-edged sword. Yes, disallowing specific operations can make code more safe in general. But it also means that you are blocking potential clever solutions.

Note that the proposal is to remove arithmetic operators for unsigned types, not operations. The same operations with the same performance characteristics would be available under different spellings.

I would also like to add that unsigned integers are often used as the size type for containers. Arithmetic operations are required if that would be the case for Carbon.

Discouraging users from trying to use unsigned types for sizes is 90% of what motivated me to open this bug. There's not enough room in the margins of this comment to go into detail, but suffice to say, signed math models the mathematical properties of sizes much better than unsigned.

h3har commented 2 years ago

Current design includes operator overloading, so it appears there would be nothing stopping users from adding this functionality back in (as extensions to existing types[?] or as operations on their own types that might wrap unsigned integers).

Not trying to make an argument either way, just stating this fact as it appears relevant for consideration. If modular unsigned arithmetic was widespread in certain libraries, operator overloading should still allow mathematical syntax to be achieved with some extra programmer effort.

github-actions[bot] commented 1 year ago

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please comment or remove the inactive label. The long term label can also be added for issues which are expected to take time. This issue is labeled inactive because the last activity was over 90 days ago.

netch80 commented 4 months ago

Three types look reasonable but with slightly another discerning: unsigned, signed, and bitfields. And, well, bitfields can't be '+'/'-'/'*'/etc. with another types until an explicit conversion to another type.

Separating modulo arithmetic seems worse because signed numbers can endure the quite similar modulo arithmetic but with their twofold range. If an operation is declared wrapped, signed and unsigned both keep least bits. Otherwise controlling overflow is valuable for unsigneds as well as signeds, and generic mechanism is needed. And here, what the mechanism is - per-type or per-operation specification - is the thing to ponder.