andyferris / Freeze.jl

Experimental API for mutable/immutable collections
Other
3 stars 0 forks source link

Shape-frozen and value-frozen collections #1

Open tkf opened 4 years ago

tkf commented 4 years ago

Continuing discussion https://github.com/JuliaArrays/StaticArrays.jl/issues/745#issuecomment-613920126

If you are going to think about these things remember that append-only datasets are a thing and that we might want eg a vector that supports push! but not setindex!.

Yes, exactly my thought. It'd be great if we can freeze/thaw push!-ability and setindex!-ability separately.

For dictionaries I was thinking isinsertable, freezekeys and thawkeys could be the (orthogonal) “shape” interface.

FYI, I'm using internal "trait" functions like implements(f!, T) and possible(f!, args...) in BangBang.jl as I don't want to keep introducing new trait functions. A similar approach could be useful if you want to distinguish append-only, FIFO, LIFO, etc. But maybe this kind of generality is not appropriate at this stage of design.

As for the naming, I think it'd be nice if "shape freezing/thawing" for vectors and dictionaries can be done with a same interface. I was thinking something like ismutablevalue and ismutableindex but I think "keys" instead of "index" is also fine since keys is implemented for arrays.

andyferris commented 4 years ago

Thanks @tkf for #1! :)

I agree, there are a lot of possible behaviors a container could expose. I think these need to be thought through in detail. I like the genericness of implements - it's kinda cool. But for now I'm mostly curious about "completely frozen" or immutable.

One of my immediate goals here are to investigate the idea of having all the "functional" operations in Base like map, filter (and +, *) generically return immutable containers, without limitting people's ability to mutate stuff when they want to. Could we use the same generic code for map(f, ::Array) and map(f, ::SArray) and have both run at max speed? So the generic array code in base would follow a pattern of using the similar factory to create a mutable container, fill it up, and return a frozen version of it:

function f(a::AbstractArray)
    ...
    out = similar(a, newtype, newsize)
    f!(a)
    return freeze(a)
end

The user will normally not want to mutate the output of f(a) (in my experience they would typically perform another functional operation with it), but when they do want to they can call thaw on it.

The point is here is that construction of new immutable containers can be very challenging without mind-bending and compiler-heavy techniques (e.g. complex recursion or generated functions). Whereas filling the values incrementally with a mutating, procedural technique is usually very easy (it is usually easy to implement f!, even for static arrays).

Now we don't yet have proper support for immutable Array but it is something the core team have interest in so I believe it will happen eventually. And given the current capability of escape analysis, I believe freeze and thaw could frequently be no-ops (and bug-preventing defensive copies otherwise).

And finally you would know better than me (from BangBang.jl) that the hard bit here is actually computing newtype (and sometimes even newsize) meaning the "simple" algorithm above would tend to be a bit different, probably involving newtype = Union{} and f!! rather than f!. Or something like that. (Union{} is unfortunately slow, but maybe that can be fixed).

tkf commented 4 years ago

(Yay, I got #1! :smile:)

Hmm... Implementing f in terms of f! is not actually my main motivation here. I agree that's easy to write when it works and desirable for performance in many situations. But, if you start needing newtype, I think that's when it becomes inappropriate. I think the common wisdom is to avoid manually computing newtype and "widen" the container eltype as you need it [*]. Also, as you said, computing newsize is rather impossible in general without looking at actual values (e.g., filtering and flattening). Other reasons for me to prefer functional style is that it's easier to parallelize and auto-diff -friendly.

As you might have guessed, my preference for writing generic functions is to define f!! and derive f and f! from it.

[*] It does not mean I'm against return_type approach per se. I actually think that the API and usage guideline of return_type should be discussed more openly. Though I think using it correctly is actually harder and requires programmers' understanding of the API.

The point is here is that construction of new immutable containers can be very challenging without mind-bending and compiler-heavy techniques (e.g. complex recursion or generated functions).

I think the main obstacle here is not the data sink but rather the data source; it's just that the for loop based on iterate is not "compiler-friendly" [**] when it comes to small containers like StaticArray and Tuple, especially the heterogeneous ones (or rather I'd say any complex containers). I think it can be solved by lowering for loop to foldl rather than iterate. This is equivalent to writing recursive function but for loop syntax can provide a fake feeling of imperative programming. (Incidentally, I'm writing a package with a macro to do for-to-foldl transformation for showing this point.)

BTW, my interest on frozen collections is more about immutable shared keys of dictionaries and shared indices of subarrays and sparse arrays. (Also shape-immutable vectors for table columns.)

[**] "Friendly" in the sense that it dumps much more information to the compiler which has to work harder to consume it. So, yeah, it's still compiler-heavy.

tkf commented 4 years ago

OK so I committed the random functions I've been tweaking and put it in a repository https://github.com/tkf/Mutabilities.jl. It has similar freeze/thaw (which somehow I called melt) machinery but includes manual ownership handling with move!. I particularly like the definition melt(x) = meltvalue(move!(meltindex(x))) which copies the input only once (usually).