ekmett / tables

Deprecated because of
https://github.com/ekmett/tables/issues/15
Other
78 stars 13 forks source link

Performance issues with `with` et. al #8

Open Palmik opened 11 years ago

Palmik commented 11 years ago

Hi, I have created a benchmark that compares the performance of data-store and tables.

What I found is that tables seems to have much worse performance than data-store when querying. I think the problem might be that with et. al not only select the matching elements from the table, but also create new table from these elements (which uses insert, which uses deleteCollisions which, as the insert-collisions benchmark indicates, also seems to be underperfoming).

The problem might also be caused by the benchamark (or bug in data-store, but the results of the benchmarks seem to be the same for both packages), so I encourage everyone to take a look. (For anyone wondering, the NFData instance for tables comes from here)

I am open to any and all suggestions.

ekmett commented 11 years ago

I fully anticipate that tables is about half the speed it could be given a more robust implementation than just deleting all the things and reinserting them. Your performance numbers seem to be about what I'd expect.

That is the trade-off for the nicer API.

We might be able to do much better if we replaced all of the Map, IntMap, and HashMap machinery with something that could track what branches have changed, and/or allow for in place filtering.

Note: with both selects and rebuilds, but when you use the Const Applicative, that reconstruction never happens it is in a function that gets thrown away.

instance Functor (Const a) where
   fmap _ (Const a) = Const a

When you use ^. to read from with, that "reconstruction" never happens.

Another aspect that could lead to better performance would be to enable IntMap and HashMap primary keys.

Palmik commented 11 years ago

Yeah, it seems that most of the time is spent creating the table from the list of 'selected' elements ((xs^.table) in the go functions), which is what I meant by "creating new table".

As a proof of concept, I have added lens based interface to data-store and, as expected, the performance sunk considerably indeed.

I have not yet measured the performance after the commit Taneb pushed to tables few hours ago.

By the way, data-store does not (yet) use (or makes it possible to use) Data.IntMap.IntMap in case the key dimension type (type of the column) is Int. So I do not think that is the main bottleneck.

startling commented 11 years ago

Could there be a variant of with that's a fold over rows rather than a lens on a new table? Then we could do e.g.

myTable ^.. with' Fst (==) 10

and receive a list rather than a new table, avoiding the overhead of creating a new table. Obviously this isn't as compositional as with, but it could make sense for small queries and for the last of a chain of queries.

ekmett commented 11 years ago

I don't have a particular objection to adding it, though it is a lot less compositional than the current approach.

One option might be to make a couple of RULES that would let you extract

myTable ^.. with Fst (==) 10 . rows

more efficiently.