Open Palmik opened 11 years ago
I fully anticipate that tables is about half the speed it could be given a more robust implementation than just deleting all the things and reinserting them. Your performance numbers seem to be about what I'd expect.
That is the trade-off for the nicer API.
We might be able to do much better if we replaced all of the Map
, IntMap
, and HashMap
machinery with something that could track what branches have changed, and/or allow for in place filtering.
Note: with
both selects and rebuilds, but when you use the Const
Applicative
, that reconstruction never happens it is in a function that gets thrown away.
instance Functor (Const a) where
fmap _ (Const a) = Const a
When you use ^.
to read from with
, that "reconstruction" never happens.
Another aspect that could lead to better performance would be to enable IntMap
and HashMap
primary keys.
Yeah, it seems that most of the time is spent creating the table from the list of 'selected' elements ((xs^.table)
in the go
functions), which is what I meant by "creating new table".
As a proof of concept, I have added lens based interface to data-store and, as expected, the performance sunk considerably indeed.
I have not yet measured the performance after the commit Taneb pushed to tables few hours ago.
By the way, data-store does not (yet) use (or makes it possible to use) Data.IntMap.IntMap
in case the key dimension type (type of the column) is Int
. So I do not think that is the main bottleneck.
Could there be a variant of with
that's a fold over rows rather than a lens on a new table? Then we could do e.g.
myTable ^.. with' Fst (==) 10
and receive a list rather than a new table, avoiding the overhead of creating a new table. Obviously this isn't as compositional as with
, but it could make sense for small queries and for the last of a chain of queries.
I don't have a particular objection to adding it, though it is a lot less compositional than the current approach.
One option might be to make a couple of RULES that would let you extract
myTable ^.. with Fst (==) 10 . rows
more efficiently.
Hi, I have created a benchmark that compares the performance of data-store and tables.
What I found is that tables seems to have much worse performance than data-store when querying. I think the problem might be that
with
et. al not only select the matching elements from the table, but also create new table from these elements (which usesinsert
, which usesdeleteCollisions
which, as theinsert-collisions
benchmark indicates, also seems to be underperfoming).The problem might also be caused by the benchamark (or bug in data-store, but the results of the benchmarks seem to be the same for both packages), so I encourage everyone to take a look. (For anyone wondering, the
NFData
instance for tables comes from here)I am open to any and all suggestions.