JuliaArrays / UnsafeArrays.jl

Stack-allocated pointer-based array views
Other
42 stars 5 forks source link

Feature request: mutable unsafe vectors #2

Open chethega opened 6 years ago

chethega commented 6 years ago

An unsafe vector could also provide push!, pop!, etc which grow/shrink the view, respectively, overwriting entries of the parent.

This would be used via uview = push!(uview, val). These can be significantly faster than base vector push!/pop!, since we can avoid both the ccall and the stupid second read-increment-store due to the fact that vectors store both length and number of rows (why, oh why).

One would need to also store a pointer_from_objref to the underlying vector, in order to not exceed its length, and resize it if necessary. See https://github.com/JuliaLang/julia/issues/24909.

oschulz commented 6 years ago

While it would, in theory, be nice to be able to grow an UnsafeVector, overwriting the parent array would be just a bit too unsafe ... :-)

pop! and popfirst! (which would be safe) are not possible either: They do, by definition, modify the vector, and UnsafeArray has to be immutable to be stack-allocated.

chethega commented 6 years ago

While it would, in theory, be nice to be able to grow an UnsafeVector, overwriting the parent array would be just a bit too unsafe ... :-)

Why? I have a vector [1 .... k ... k+ell ... N], with an unsafe view defined by [k... k+ell]. Writing into the unsafe view makes sense (overwriting the parent entry k+ell+1). Moving k or k+ell to the left or right makes sense (sliding window). Increasing ell, and overwriting the slot there makes sense. We would need the pointer_from_objref to the parent, because we do not want to corrupt memory (write beyond the bounds of the parent), and hence need to check the parent's length on every shift of the unsafe view. Shifting the view to the right and writing into the slot is functionally equivalent to push!.

pop! and popfirst! (which would be safe) are not possible either: They do, by definition, modify the vector, and UnsafeArray has to be immutable to be stack-allocated.

Obviously the API would need to differ. Just like immutability of Int64 does not prevent n+=1, you could do uview,val = uvpop(uview). You're right that the API is different enough from base that this should be named differently.

oschulz commented 6 years ago

Well, for the use case of sliding windows, you would just create many unsafe views, but they wouldn't be resized. And if you need to resize an unsafe view, you can just create a new one. But the UnsafeArrays themselves are very lightweight (on purpose), and have no information on what the size of the original array was - so they can't grow (even by returning a new UnsafeArray) in a safe way.

In fact, since an UnsafeArray can be created from a memory pointer and a size, there may not even be an original array, at least not one allocated by Julia. But the user does have that information, and can create additional new unsafe views as required directly.

oschulz commented 6 years ago

Actually, the calloc-trick presented by foobar_lv2 in a recent thread on the Julia Discourse may profit from a kind of mutable unsafe vector that knows it's maximum bounds. It would be quite different from UnsafeArray though, and wouldn't support stack-allocated views.

chethega commented 6 years ago

Ok, you're right; mutable is the way to go for this. foobar from discourse is actually me, btw. I tend to use new pseudonyms for every account I make.

oschulz commented 6 years ago

Ok, I think a mutable version of UnsafeArray that would support use cases like we discussed on discourse (oversized malloc/calloc, growing via page faults and then handing final array to Julia) would fit in here. I probably won't get to do it right away, though, have to finish ArraysOfArrays.jl first ...

oschulz commented 6 years ago

As for sliding window use cases, I believe the right approach is to simply create a lot of throw-away views, as @uviews make them very very cheap.

oschulz commented 6 years ago

This could actually mesh well with TemporaryArrays.jl (new package, still in design phase). The idea there is to provide reusable temporary arrays, to avoid allocation cost for temporary arrays (e.g. target array for boradcast!s, etc.) in tight loops. I originally intended those to be UnsafeArrays backed by Vector{UInt8} behind the scenes, but manual (c)alloc/free and a ResizeableUnsafeArray may be a better fit. For that use case, ResizeableUnsafeArray itself would have to be an immutable type (kind of a hybrid of UnsafeArray and ElasticArray) that holds a pointer to a mutable "storage allocation" object. Or possibly an ElasticArray wrapped around mutable ResizableUnsafeVector.