PixarAnimationStudios / OpenSubdiv

An Open-Source subdivision surface library.
graphics.pixar.com/opensubdiv
Other
2.89k stars 561 forks source link

Revisit use of int vs. unsigned for indices, sizes, counts, arities etc. #1222

Closed virtualritz closed 3 years ago

virtualritz commented 3 years ago

All these quantities are positive values. What is the rationale for using int over unsigned here?

On that note: I am not the only one wondering this. Quote from vtr/types.h:

 //  The convention throughout the OpenSubdiv code is to use "int" in most places,
 //  with "unsigned int" being limited to a few cases (why?).  [...]
virtualritz commented 3 years ago

I saw that some code sections rely on returning -1 (aka INDEX_INVALID) if an Index can't be found.

I reckon that works just as well if you return UINT_MAX to represent this (which is binary identical ofc.).

davidgyu commented 3 years ago

Those are all good points!

The current use of signed types was a consequence of following some existing interfaces (e.g. including hbr). Changing these to unsigned types is something that we would consider as part of a major API revision.

virtualritz commented 3 years ago

Changing these to unsigned types is something that we would consider as part of a major API revision.

Great!

The Rust binding I'm working on already uses unsigned and has checks in place so that if someone really tries to feed something bigger than INT_MAX the code on the Rust side panics.

On that note: how robust is the OSD code when someone feeds a negative index or an index outside the length of the target buffer to the API? Is this caught somehow or will it cause UB?

barfowl commented 3 years ago

To answer the most recent question, the OSD code is not robust to being given bad input. It is the caller's responsibility to validate their input first. This was a conscious decision to not add additional runtime overhead when the vast majority of callers are dealing with well-bounded indices.

barfowl commented 3 years ago

I had been working on a reply to the original question earlier and had to step away from it when David's reply came in. As David said, such a change would be considered as part of a "major" API revision, but its questionable whether we would make such a change.

Being the author of the comment cited in vtr/types.h, and having been a proponent of the use of unsigned indices, I thought I'd add some thoughts here...

At the time of that comment, the reason why the use of signed types persisted was largely due to two reasons: the use in other components like Hbr and simple inertia. I share your perspective and came from a background where I used unsigned types and UINT_MAX as INDEX_INVALID for similar purposes in a connected mesh representation. At the time (over five years ago now) we were making major changes for version 3.0 and were migrating away from Hbr, but the number of existing clients using 2.x could not be taken lightly. They were already going to have to make significant changes, and spreading the change to unsigned integers to more of the existing code base was going to amplify the effort for both us and them.

So we stayed with the use of signed integers. With the number of clients having grown considerably since then, the case of inertia hasn't gotten any weaker -- quite the opposite.

Since then I've also become aware of other perspectives on this issue...

The Google C++ Style Guide and others explicitly discourage use of unsigned integer types to imply or enforce positivity (see "On Unsigned Integers"). One of the main reasons for this is the bugs that arise from the use of unsigned arithmetic where negative values might be required (e.g. the difference between two uints), which is something we do a lot of with indices in some of the aggregate types in OpenSubdiv. (And along those lines, they report that the past decision to make size_t unsigned is widely viewed as a mistake within the standards body.) Given my experience with the unsigned types as proposed, I appreciate that perspective more now and am less enthusiastic about promoting the use of unsigned.

And having just entered that, it occurs to me that this alone could be enough to answer the question as to why we wouldn't revisit the issue, though it wasn't the original reason for the use of signed ints.

But the point I was intending to make was that the original inertia argument carries more weight now than it did in the past. Simply ask the question "what benefit would come with such a change to offset the major burden that would be imposed on all existing clients?".

Back to David's comment, we would reconsider it as part of a major API change -- when clients have to make major changes across the board -- but given the above, there's a good chance we would still stick with signed integers.

thomthom commented 3 years ago

I share the same experience with Barry. Having used unsigned for types for where negative values were not valid for the domain, but running into a number of scenario where arithmetic operations could cause overflows. This lead to more frequent checks and branching than desired whereas allowing negative values made it easier to catch and consequently easier to read the code.

That being said, when also dealing with the STL alongside a signed API you end up having to perform casts to avoid narrowing warnings.

Incidentally, I was watching some cppcon talks today and this particular one pertain to this discussion: https://www.youtube.com/watch?v=wvtFGa6XJDU

virtualritz commented 3 years ago

TLDR; In all the cases in the title of this issue feeding a negative number into the API is an error.

I am solely asking to convert the outside API here. No internal code needs to be touched. The only issue with new code/API would be someone, after the API change, having indices/sizes/etc. bigger than INT_MAX.

When you convert that to int you get UB. The likeliness with existing code is zero since that uses an int API. And how likely is that after a new API got released? Meshes with more than 2147483647 indices? If that were a concern you would move to long anyway. So we can disregard this point.

Inertia? There is no burden on existing clients. You can directly cast an int to unsigned w/o any overhead/penalty in C/C++ (try in godbolt). I.e. this is a no-op. So anyone using OpenSubdiv would just do a free typecast. Then again, writing (unsigned) or (unsigned*) a few times could be considered a burden by some. 😉

The cast will only explode in your face if your value was out of range. That means it must have been negative which is an error in the first place (see above).

It can't be out of range in the other direction. If the new API used unsigned and you have code written for a previous version using int you're casting from int (your old code) to unsigned (new API). Not the other way around. INT_MAX < UINT_MAX.

The Google C++ Style Guide and others explicitly discourage use of unsigned integer types to imply or enforce positivity.

A gem from that Google style guide:

Because of historical accident, the C++ standard also uses unsigned integers to represent the size of containers - many members of the standards body believe this to be a mistake, but it is effectively impossible to fix at this point.

After two years of Rust I can only shake my head in disbelief about this. But then that's the same standards body that gave us a language that is broken beyond repair.

Although this is now beside the point of this issue which was about the outside API I just want to share some thoughts on writing code dealing with topological mesh operations that uses solely unsigned. I happen to have spent the last year writing such code. On a side project, in Rust. So I feel I can share some first hand experiences. The three most common 'bad' cases for unsigned integer subtractions are:

  1. Underflow that should have never happened. Aka: an error you should have caught.
  2. You need/expect a negative index but you forgot to do an explicit type cast before (polluting the entire API with int on the outset for this reason would be kinda ... crazy).
  3. You could reorder the operations to avoid the underflow but that wasn't obvious to you. Because ultimately your final index into some data is positive for sure.

But ofc. I understand that in C/C++ you are on your own here, mostly.

I.e. 1. is UB but will likely just underflow/wrap around, depending on compiler. 2. & 3. require explicitly paying more attention.

In Rust the compiler will do all this for you. Any unsigned subtraction that underflows will panic at runtime. And all type conversions/casts are explicit. Trying to do any math mixing types (i.e. w/o explicit type casts) will make the compiler refuse to cooperate (think -Werror baked in).

Regarding 3.: if you e.g. have something typical (this is actually taken from code of my own doing some topo stuff):

for i in 0..face.len()
    face[(i - 1 + face.len()) % face.len()] = ...

This will not work with unsigned as i - 1 == -1 in the first loop iteration. In C++ it will likely explode in your face in some interesting way. In Rust it will simply panic with a nice backtrace.

A simple reordering fixes this and alleviates any need for signed integers:

for i in 0..face.len()
    face[(i + face.len() - 1) % face.len()] = ...

On this page these two comments more or less have a long version of the above plus some more concerns regarding UB. https://www.learncpp.com/cpp-tutorial/unsigned-integers-and-why-to-avoid-them/comment-page-2/#comment-487024 https://www.learncpp.com/cpp-tutorial/unsigned-integers-and-why-to-avoid-them/comment-page-2/#comment-474548

For the OpenSubdiv Rust wrapper I'm working on everything is already u32 (aka unsigned). Every value except index arrays is checked for overflow when passed to OpenSubdiv. As for index arrays: I guess if you feed the lib negative indices in the topology all sorts of UB may happen? :)

And having just entered that, it occurs to me that this alone could be enough to answer the question as to why we wouldn't revisit the issue, though it wasn't the original reason for the use of signed ints.

I hear you. Closing this for good.

virtualritz commented 2 years ago

Just saw "Almost Always Unsigned" on HN so adding for reference.

Related discussion is here.