Open taktoa opened 6 years ago
I've been writing some mutable containers that you may want to add to impure-containers
:
EqSat.Internal.MHashMap
(master, permalink) is a wrapper over Data.HashMap.Mutable.Basic
that adds forM_
, insertWith
, length
, member
, null
, freeze
, and thaw
.EqSat.Internal.MHashSet
(master, permalink) is a newtype for MHashMap k ()
with a similar API to EqSat.Internal.MHashMap
.EqSat.Internal.MStack
(master, permalink) is a mutable stack that is contiguous in memory (as opposed to most stack implementations on Hackage, which tend to be something like newtype MStack s a = MStack (MutVar s [a])
. This is pretty close to your Data.ArrayList.Generic
module, except it has more functions (e.g.: pop
, thaw
), better documentation, and it reallocates when the size of the underlying vector is double the number of actual elements in the stack (it checks each time you pop
). This module is pretty close to being ready for inclusion in impure-containers
. The only issue that isn't performance-related I can think of is that newWithCapacity
is kind of useless since there is no way to pop
without compacting.EqSat.Internal.MBitmap
(master, permalink) is a monomorphic wrapper over the MutableBitmap
type defined in foundation
. It needs a little documentation / cleanup work, but it has some nice property tests using hedgehog
.Since not many people depend on impure-containers
yet, perhaps it would be worth revisiting the module structure. I think it's nicer to have a less-nested module hierarchy, where the names at the leaf nodes are always the same as the type that module exports (so the module exporting MHashMap
is called Foo.Bar.MHashMap
). I think it would be nice to adopt a convention where the mutable version of a structure is called MFoo
and the immutable version is called IFoo
(MutFoo
/ImmFoo
would also be fine), though I can understand if you think that's not worth pursuing. I'm thinking that the renamed module hierarchy could look like this (Foo ↦ Bar
means that that the datatype currently defined as Foo
should be renamed to Bar
):
ImpureContainers.MArrayListB
: ArrayList
specialized for Data.Vector.MVector
and renamed to MArrayListB
.ImpureContainers.MArrayListU
: ArrayList
specialized for Data.Vector.Unboxed.MVector
and renamed to MArrayListU
.ImpureContainers.MArrayListP
: ArrayList
specialized for Data.Vector.Primitive.MVector
and renamed to MArrayListP
.ImpureContainers.MArrayListS
: ArrayList
specialized for Data.Vector.Storable.MVector
and renamed to MArrayListS
.ImpureContainers.MArrayListG
: ArrayList ↦ MArrayListG
.ImpureContainers.IGraph
: Graph ↦ IGraph
, SomeGraph ↦ SomeIGraph
.ImpureContainers.IGraph.IVertices
: Vertices ↦ IVertices
, vertices*
ImpureContainers.MGraph
: MGraph
, insert*
ImpureContainers.MGraph.MBVertices
: MVertices ↦ MBVertices
, verticesReplicate
, verticesWrite
, verticesRead
.ImpureContainers.MGraph.MUVertices
: MUVertices
, verticesUReplicate
, verticesUWrite
, verticesURead
.ImpureContainers.GraphTypes
: reexports ImpureContainers.GraphTypes.*.*
, IGraph
, IVertices
, MGraph
, MBVertices
, MUVertices
, defines type MGraphIO = MGraph RealWorld
and type MGraphST s = MGraph s
.ImpureContainers.GraphTypes.Vertex
: Vertex
, vertexInt
.ImpureContainers.GraphTypes.Size
: Size
, sizeInt
.ImpureContainers.MHashMap
: MHashMap
and its functions.ImpureContainers.MHeapC
: RawHeap ↦ MHeapC
and its functions.ImpureContainers.MHeapD
: Heap ↦ MHeapD
and its functions.ImpureContainers.UnsafeMaybe
: UnsafeMaybe
and its functions.ImpureContainers.MMaybeArray
: MutableMaybeArray ↦ MMaybeArray
and its functions.ImpureContainers.BoolByte
: BoolByte
.ImpureContainers.MMaybeVar
: MutMaybeVar ↦ MMaybeVar
.ImpureContainers.MPrimArray
: MutablePrimArray ↦ MPrimArray
and its functions.ImpureContainers.ITrie
: Trie ↦ ITrie
and its functions.ImpureContainers.MTrie
: MTrie
and its functions.I'd also like to adopt a convention (that you already seem to be adopting in some modules) where the user is expected to import these modules qualified, so we don't have to suffix the name of the type onto the function (e.g.: newPrimArray
just becomes new
and the user is expected to use it as MPrimArray.new
). If that sounds inconvenient to you, I'd also be willing to have a top-level ImpureContainers
module that reexports all of the types and wrapper functions (so ImpureContainers.MPrimArray
would export MPrimArray
, new
, etc. whereas ImpureContainers
would export MPrimArray
, newMPrimArray
, etc.).
The MTrie
type seems bad to me because it causes problems with the garbage collector (too many MutVar
s). One idea for a solution to this that I had is a Haskell implementation of malloc
/free
for a MutableByteArray#
, so that you can keep the trie in one contiguous blob of memory for cache coherence (and, if the trie is not too big, you could use Word32
offsets instead of pointers for compactness, though the lack of a Word32#
means this may slow down some operations). You could also maybe use compact regions, though I suspect that it would be slower than a custom allocator.
First of all, some broader context around this library. As you can see by the commit history, I've not done much with this in the last two years. The only decent thing that this library offers is the Graph
-related stuff (mostly because unlike every other graph library on hackage, it allows you to keep track of the fact that a vertex belongs to a graph). The heap described as "model C" is pretty cool too, and fortunately, the highly-useful PrimArray
got merged into primitive
recently (not released to hackage yet). I consider everything else trash (or nearly trash). As you point out, MTrie
has disastrously bad performance. I used it once to build a trie that I needed to put several million entries into, and the garbage collector just ran 100% of the time. So, really bad.
But there's something else flawed with this whole approach to mutable data structures: it makes GHC generate suboptimal code. Let's look an abbreviated, specialized version of your MStack
type and the interface surrounding it:
data MStack s a
= UnsafeMkMStack
!(MutVar s Int) -- length
!(MutVar s Int) -- vector capacity
!(MutVar s (MutableArray s a))
new :: ST s (MStack s a)
push :: MStack s a -> a -> ST s ()
pop :: MStack s a -> b -> (a -> b) -> ST s b
Cool. What's bad is that we have to put everything behind a MutVar
. If someone wants to use a stack like this in a tight loop, they might do this:
myFunc :: MStack s Int -> ST s ()
myFunc !s = go 1000 where
go n = if n > 0
then do
foo <- pop s 42 id
... -- do something with foo, possibly pushing more onto the stack
go (n - 1)
else return ()
Every call to push
/pop
causes readMutVar
or writeMutVar
to get called, which means that stuff gets written to memory. It's unfortunate because, normally, in a tight loop like this, GHC would end up generating a nice tight loop that just puts the arguments in registers. In fact, we can make it do this by making the interface less safe:
data MStack s a
= UnsafeMkMStack
!Int -- length
!Int -- vector capacity
!(MutableArray s a)
new :: ST s (MStack s a)
push :: MStack s a -> a -> ST s (MStack s a)
pop :: MStack s a -> b -> (a -> b) -> ST s (MStack s a, b)
Of course, now push
and pop
have an unenforceable invariant: The argument MStack
must not be reused. Now, myFunc
would become:
myFunc :: MStack s Int -> ST s (MStack s Int)
myFunc = go 1000 where
go n !s1 = if n > 0
then do
(s2,foo) <- pop s 42 id
... -- do something with foo, possibly pushing more onto the stack
go (n - 1) s3
else return ()
And myFunc
now has the whole "this in linear in the MStack
argument" thing going on as well. Again, unenforced by the type system, and users are bound to mess up. But, the generated code is much better. By this point, you probably see where I'm going. The interface I really want depends on the successful implementation of the Linear Haskell proposal:
data Stack a = UnsafeStack !Int !Int
!Int -- length
!Int -- vector capacity
!(MArray a)
new :: (Stack a ⊸ Unrestricted b) -> b
push :: Stack a ⊸ a -> Stack a
pop :: Stack a ⊸ b -> (a -> b) -> (Stack s a, Unrestricted b)
It has the Linear Haskell's tell-tale CPS jankery, but it's both fast and safe. And it doesn't have to concern itself with the PrimMonad
weirdness. This is the interface I actually desire for all the mutable data structure I use in practice.
Aside from the fact that I want to destroy this package and build a linear-containers
package once Linear Haskell lands, I agree with all of your suggested changes. The name changes sound good, and the additions to the API are good too. I don't mind breaking changes to the API since I'm pretty sure no one uses this package but us. I'm going to give you a commit bit to this repo, so feel free to do any of the things you described.
One more point I'll add about the whole linear-vs-ST thing. The linear implementation certainly will win if you have a single stack that you're just passing through the whole program (especially if it goes through a bunch of tail recursive functions like in the example above). Where it gets more interesting (and I have absolutely no real data on this, only thoughts) is where you have stacks that are a part of other data structures, and the other data structures are used in such a way that GHC isn't able to optimize away all the data constructor allocations like it could in the example I gave above. Now, modifications to a Stack
will sometimes cost something at runtime, since an allocation actually happens for the data constructor. But, your ST
implementation never allocates (actually, since you use MutVar#
, which boxes its argument, it allocates for the I#
data constructor every time you do anything, but if you used something like PrimRef from my deprecated prim-array
package, it wouldn't). I wonder if there is any code for which the occasional allocations of the linear variant outweigh the bad CPU register utilization of the ST variant.
And of course, no discussion of this issue would be complete without mentioning the relevance of the frighteningly longstanding Mutable Constructor Fields Proposal.
Ah, yes, I was thinking that MADTs would improve things. I was frustrated by the boxed-ness of MutVar#
, is there a non-deprecated replacement for PrimRef
, or would I just need to hack it together with a MutableByteArray#
?
You should probably just copy the whole PrimRef
module into impure-containers
. There's nothing really wrong with it. It's just that a package named prim-array
is sort of goofy now that the type is in primitive
instead. And, of course, impure-containers
rolls its own copy of PrimArray
as well since I never got around to making it depend on prim-array
.
I've been using
impure-containers
in my eqsat project, and I've come up with some issues that you may be interested in. I'm going to comment on this issue with each one, to avoid spamming you. Hackage is sorely lacking a good consistent package for mutable containers, andimpure-containers
is the best thing I've seen so far :-), so I'd like to see some improvements (and will likely write up PRs after I graduate on May 10).cc: @chessai