Iterator invalidation is a major source of UB/Security bugs that is not possible to prevent with the C++ stdlib due to the design of iterator pairs, and native pointers as iterators. We can address this at runtime in a much cheaper way (no atomic locks and global hash tables or anything) than with libc++ iterator debug mode, though it will be more invasive as a result.

I think the only reasonable way is a refcount in anything iterable.

Then chaining iterators needs to not add more references, either move the reference-holding, or just not add more references.

Welp, this is of course harder than it seemed at first glance. Here are some challenges.

Vec is-a SliceMut is-a Slice so they share storage. Vec needs to hold a counter, but Slice/SliceMut don't hold a counter, iterators they vend are invalidate by the Vec reallocing, not by the Slice. So we introduce a union IterRefCounter { usize; usize*; }; in Slice. Then a bunch of care and macros are needed to turn the IterRefCounter into other IterRefCounters when making more Slices depending if the method is in Slice/SliceMut or in Vec, as a different field is active and needs to be copied.

Maybe we could just use a usize (uptr in the future) and convert it to an actual pointer when handing it off to iterators, because Slice/SliceMut don't actually need to deref the pointer. That would simplify that one a bunch maybe.

Next, Vec<T>::last() or Slice<T>::last() among other functions return Option<const T&> which can be converted into an Iterator as a Once<const T&>. Now we have an iterator over one element of the Vec, but there's no connection back to the Vec, so it can be invalidated by the Vec reallocing. To get that would need Option<const T&> to hold a pointer to the Vec refcount, at which point you realize every pointer/reference is coming with this backpointer to the container, which sounds bad.

There's really no good way to keep the connection without having this reference in every type that can hold a reference and can be iterated. I really do not want to add a refcount into Option of reference, it is too core and that would be really unfortunate given so many references would not even be to a container.

Even adding a 3rd field to Slice feels kinda bad but that one seems truly mandatory given the relationship between slices and containers and iterators. Also a saving grace here is that Slices can be passed by reference since they part of the Vec itself, so there's no additional pointer going into registers in that case? Still more stuff on every Slice mutation or copy though.

Catching Iterator Invalidation

To resolve some of these challenges we need to understand what we are trying to achieve here and what we are not.

One form of memory unsafety is what I term "Aliasing x Mutability", which is where mutating values has side effects on references (pointers) to the same objects that lead to memory bugs. This is, in general, probably the hardest and least likely form of memory safety to be preventable in C++. Iterator invalidation is an important subset of this form of unsafety, and scoping it down does presents us some capability to do something. A key part of Subspace iterators is that they present places to place code to prevent or catch iterator invalidation.

Let's further break down iterator invalidation into three categories.

Invalidate before use

The first is you construct an iterator, invalidate it, and then use it. This is must more obvious in code:

auto v = Vec<i32>::with(1, 2, 3, 4);
auto it = v.iter();
v.push(5);  // Invalidates `it`.
for (i32 i: sus::move(it)) { println!("{}", i); }

While this type of bug is possible to write, it relies on the developer constructing an iterator early, then holding onto it for a while without using it while mutating the container it iterates over. Observing this bug is possible with local analysis unless the iterator is travelling far from the container, and can be resolved by giving access to the container and constructing the iterator at the place you intend to use it.

Invalidate during iteration

The second type is where you invalidate the iterator while iterating. This is the much more common problem because it can often require non-local analysis to find it happening. Inside a for loop, arbitrary code can be executed at great distance, and with side effects across the system including back to the container being iterated over. Moving the iterator construction or usage is not possible to resolve this type of issue. Whole program understanding is required to find and avoid these, which is the worst enemy of software developers trying to write secure software.

By focusing on the second type, we address the larger problem while scoping out some things that would be problematic to implement through codegen and types.

For example, Option<T> can be iterated, which is implemented through the Once iterator type. Containers can return an Option such as through Vec<T>::get(usize). So it can easily be the case that Once is iterating over a container, however we must observe that only the first type of invalidation is possible with Once. As it only returns a single element, once the first iteration step begins it can no longer be invalidated. The avoids the need for iterator invalidation protection to have to travel through Option<T> from Vec<T> to Once, which is crucial because Option<T> is an important vocabulary type which must be performant and small (as it's even possible to avoid the extra bool storage). And because Option<T> is used in many contexts that have nothing to do with containers, so adding cost here would be bad for many more uses than it would help.

Invalidate by destruction

If the container is destroyed before or during iteration, the iterator is invalidated. This is different from the type of invalidation caused by mutating the container and falls into the realm of lifetimes and Use-after-Free (or Use-after-Destruction). We consider this out of scope for iterator invalidation as it can and should be addressed along with all other forms of Use-after-Destruction.

For inline storage containers, moving the object invalidates iterators, which we can group in with other mutations, so it remains in scope.

Iterate by consuming

Iterating over containers can be done while consuming the container, when they satisfy the IntoIterator concept, which most any container should do.

auto v = Vec<i32>::with(1, 2, 3, 4);
for (i32 i : sus::move(v).into_iter()) { println("{}", i); }

Iterator invalidation when doing so through IntoIterator on the container is impossible. The container is owned by the Iterator in this case.

Narrowing scope

From the above we can see that while there are an enormous number of Iterator implementations (some grepping suggests 74) in Subspace already, we do not need nor want to introduce costs of invalidation tracking to every one.

We can exclude these iterators:

Once is out of scope as it can't be invalidated during iteration.
There are many iterators that are constructed from ways other than from containers, such as from a std::ranges::input_range (which can't participate in invalidation protection) with sus::iter::from_range(), from a generator function with sus::iter::from_generator, or from other inputs than a container such as sus::iter::successors() or sus::iter::repeat(), etc.
Iterators that are produced by methods on the Iterator can be omitted, the root iterator will perform the tracking if it's required, with one exception, by_ref().
The by_ref() iterator will hold a reference to another iterator, which may own the container. However the only invalidation possible in this case is Use-after-Destruction if the Iterator it is constructed from is destroyed. Otherwise mutating the original iterator does not invalidate the by_ref() iterator.
Iterators that own the data they iterate over, such as those produced through the IntoIterator concept.

Where do we need invalidation protection:

Reference-type iterators over containers, such as those produced by iter() or iter_mut(). These iterators refer to the container without owning it, and mutation of the container can leave them with an invalid pointer as they iterate over each item.
Iterators returned from view types such as Slice or SliceMut also necessarily do not own the underlying container they iterate over, and can have multiple iteration steps.

This still leaves a lot of iterators. iter(), iter_mut(), chunks(), chunks_by(), rchunks(), split(), split_inclusive(), splitn(), windows(), and a lot more. Additionally, iter() and iter_mut() will be needed extremely frequently when the container ownership can not end during iteration and cloning it is undesirable. This demonstrates that there's still a lot of value in preventing invalidation of these iterators.

The cost will be an extra pointer in each container as well as their view types, so in Vec and Slice for example. The benefit will be that you don't get a security bug from side-effects during iteration.

Catching vs Preventing

We will aim to catch invalidation when it happens and terminate the program. It is also possible in general to prevent invalidation, but that requires preventing "Aliasing x Mutability" which is outside the scope of a library, and requires something on the order of borrow checking.

So we will aim to notice when mutating a container will invalidate an iterator that may be iterating and terminate. This results in an actionable crash report (a cost on the developer) instead of a security vulnerability (a cost on the user alone).

https://github.com/chromium/subspace/pull/279 implements iterator invalidation for SliceIter and SliceMutIter. Next steps are:

There's an ASAN failure in Drain. Drain is complicated because it holds a Slice to a Vec and then needs to move the Vec. So it has to add this public API to vec to detach/attach the Slice. I'm not excited about it. And there's an ASAN bug.

[x] Figure out Drain ASAN bug.
[x] Consider Vec/Array/Slice traits, which can default to IteratorInvalidation=true. But then Drain could disable it for its SliceMutIterator, since it owns and encapsulates the Vec and never mutates it. Traits would allow controlling the Vec growth factor in the future too. And maybe the Allocator should be controlled through that?
[x] If traits work out, remove the #ifdefs around disabling Iterator invalidation. Instead, template the IterRef and IterRefCounter types on whether counting is enabled.
[x] Traits didn't work out, give Slice/SliceMut a way to be detached from invalidation tracking.
[x] All the other iterators returned from Slice/SliceMut (chunks, splits, windows, etc).

I don't like traits on Slice... they are fine on Vec where they can provide the Allocator and growth factor. Vec needs something for the Allocator regardless.

But on Slice, it means that Vec return Slice<T, Traits>, but other code is written with Slice. This is "fine" in the sense that behaviour degrades gracefully. If the Slice wants iterator invalidation but it's given nothing to track, then it just does nothing, much like a Slice to something other than a container.

But it means that the Slice<T, Traits> type must be convertible to Slice<T, OtherTraits> and methods that receive Slice needs to template the traits everywhere lest they force copies just to satisfy tries.

So, new idea, can we just disable iterator tracking on a Slice, and do that on the Slice (and its iterator) in Drain up front, and not need to worry about it thereafter?

Ok I don't like Traits on Vec either. Vec<T, Traits> needs to be comparable to Vec<T, OtherTraits>, okay no problem. But Vec<T, AllocatorOne> can't be assigned to Vec<T, AllocatorTwo>. Yet, the GROWTH_FACTOR part of the traits should be convertible/assignable.

I think the conclusion here is that template parameters make sense when they aren't meant to be compatible types, they are different in meaning. Growth factor doesn't meet that, so it does not fit in the type.

chromium / subspace

Prevent Iterator Invalidation #259