Feature request: mutation interface for transient iterators

voltrevo commented 5 years ago

UPDATE: I misread ::update in the docs before writing my thoughts below.

Perhaps T& immer::vector_transient<T>::iterator::mutable_dereference()? Also T& immer::vector_transient<T>::mutable_at(size_t index)?

These operations would copy if shared before providing the reference to maintain valid copy-on-write semantics.

I'm trying to implement scalar multiplication (and others) efficiently in a virtual machine. Mutating elements in-place makes a difference for e.g. scalar multiplying a vector of vectors.

This would require the caller being responsible for ensuring usage of the reference isn't improperly combined with a variety of immer operations though, so that could be a bit messy actually :/.

In general, I am concerned with achieving in-place mutation as often as possible in a variety of cases. Do you think immer would still be a practical choice for this?

How practical would it be to use the immer::detail constructs directly? (Understanding that these APIs may change, of course.)

Perhaps there could be raw_vector and raw_flex_vector exposing more of the internals (and footguns), and vector, flex_vector could be implemented as wrappers but expose incompatible APIs. Possibly a lot of work for a possibly narrow use case though.

voltrevo commented 5 years ago

Actually, I've just realized that ::update takes a function which receives a non-const reference to the element. The minor point about using an iterator stands though, since it should be a more efficient than doing a lookup from size_t every time. Perhaps overload ::update?

arximboldi commented 5 years ago

What algorithm are you thinking about? The problem I see with a mutating iterator is that it is only useful when you have something like this:

for (it = t.begin(); it != t.end(); ++t) {
    if (cond with probability < 1 / 2^B)
         *it = ...
}

If otherwise most elements do change (which is most of the time the case when mutating through an iterator) it is probably more efficient to simply build a new vector using a new transient vector and push_back.

Also, in the case shown above where we do preserve structural sharing:

for (it = t.begin(); it != t.end(); ++t) {
    if (cond with probability < 1 / 2^B)
         *it = ...
}

This interface is not gonna be more efficient than using update() unless elements are layed out such that positive elements are in continuous blocks (when a node is shared and we need to clone it internally, we need to do so for the whole path.)

So, what I am trying to say is simply that the use-case for a mutating transient vector iterator is actually very rare. I leave this open since of course, for the sake of completeness, it is nice to have. Even though it worries me that it might lead to people to pesimize their code by mistake (using the mutating transient to update a new vector instead of producing a new one).

arximboldi commented 5 years ago

I just realized I did not pay enough attention to your second comment. In the first case, I was assuming the iterator returning a proxy reference (something like vector bool) that overloads operator=. This leads to the potential performance pitfals I mentioned at the end of the comment, plus it prevents the thing from being a real iterator pre-C++17. However, I kind of like the idea of having an update() that takes an iterator as a "hint", this would allow to use the most efficient API for the rare but plausible pattern of updating vectors.

Note that in your vector multiplication example, you are updating every element, so the structural sharing argument does not apply. In most cases making a fully new vector is gonna be the most efficient. But arguably there are some cases where if you are operating on a temporary vector, you could avoid the allocations and part of the lookups, even if you are touching every element. This also justifies having this API.

Thanks for the suggestion! I'll leave the ticket open and try to address it soon. And as usual, PR's are welcome ;)

voltrevo commented 5 years ago

Cool :+1:

I think I might be missing something when thinking about the case of mutating most/all of the elements. Rather than preserving structural sharing, I want to exploit the case that there isn't any structural sharing (or very little structural sharing), so the transient has exclusive ownership, and can therefore update all elements in-place.

E.g.: When we're halfway through mapping x -> x^2 on [1, 2, 3, 4, 5, 6] the only vector that should exist is [1, 4, 9, 4, 5, 6] and not [1, 2, 3, 4, 5, 6], [1, 4, 9].

Does that make sense? The vector should know when it has exclusive ownership right?

arximboldi commented 5 years ago

Andrew Morris notifications@github.com writes:

Cool :+1:

I think I might be missing something when thinking about the case of mutating most/all of the elements. Rather than preserving structural sharing, I want to exploit the case that there isn't any structural sharing (or very little structural sharing), so the transient has exclusive ownership, and can therefore update all elements in-place.

E.g.: When we're halfway through mapping x -> x^2 on [1, 2, 3, 4, 5, 6] the only vector that should exist is [1, 4, 9, 4, 5, 6] and not [1, 2, 3, 4, 5, 6], [1, 4, 9].

Does that make sense? The vector should know when it has exclusive ownership right?

Yes, it does make sense and as I suggested in my last message, it is a legit use-case.

Cheers!

JP

-- ∿∿∿ https://sinusoid.al ∿∿∿ https://sinusoid.es

voltrevo commented 5 years ago

Ahh ok. I think we're in sync now, cheers.

arximboldi / immer

Feature request: mutation interface for transient iterators #70