bwu-dart / bwu_datagrid

A data-grid Polymer element in Dart
MIT License
74 stars 26 forks source link

Suggestion: use bit set to keep track of selected rows #61

Open rkaneriya opened 10 years ago

rkaneriya commented 10 years ago

Great work on the datagrid!

I noticed that a list of ranges is currently being used to keep track of selected rows (in the row selection model) and is being converted to a list of row indices every time the getSelectedRows() method is called.

How about using a bit set to keep track of selected rows? Seems like it might be a more efficient option. Let me know what you think!

zoechi commented 10 years ago

Hi @rkaneriya, thanks for your interest in the Datagrid. Seems you investigated the code closely already. Currently I'm trying to make all features of of the original work. There is still some work to do. Only about 1/2 of the examples are implemented yet.

It would be a bit premature to do this kind of performance optimization. Performance is very important for the Datagrid and I wiil keep this issue open until it's time to take a stab at performance.

I didn't hear of a bitset for Dart yet. Do you have a link?

How you would implement this using a bitset, a bit for each cell in the grid? This wouldn't be very efficient. I think a list of ranges is not so bad. A range stores the row-number and column-number of the top/left and bottom/right cells of a range (4 int variables for each range no matter how big).

Even though selections can be set programmatically usually it's done manually by a user. Do you have a use case for huge number of selected ranges?

rkaneriya commented 10 years ago

Good point about performance. I was thinking of using a bit set primarily for multiple row selections -- not multiple cell selections -- but I agree that using ranges is still an efficient solution. Regardless, in case you're interested, here's a good bit set implementation in Dart: https://github.com/skalkin/bit_set.

Thanks again for your work on the Datagrid. I'll keep checking in to see your progress!

skalkin commented 10 years ago

Hi Günter,

I can give you a good use case for that.

We are very interested in having a high-performance grid for exploratory data analysis. It is common to have datasets consisting of millions of rows (or tens of thousands of columns - but fortunately not at the same time), so efficiency in all aspects of data visualization is paramount. One of the common techniques in data exploration is selecting an arbitrary number of records (rows) for either further processing, or to get an instant visual feedback of where those records fall on other viewers connected to the same dataset (scatterplots, histograms, etc). Since the rows could be selected by using pretty much any imaginable combination of criteria (based on values in certain columns, manually, etc), keeping ranges is not an efficient representation - bitsets are much more compact, have O(1) access complexity and perfect cache locality.

We can work around ranges and achieve the desired result by customizing the grid and doing one of the following:

I understand that bitset-based solution might not be the best representation in the case when user manually selects rows (although it is still very efficient one - would take ~100Kb for a million-rows dataset, so personally I'd be inclined to use this). A possible solution that would utilize the best of both worlds would be for the grid to access row selection state via an interface with two methods getSelected(int idx) and setSelected(int idx) and let us override the default implementation of this interface.

I also happen to be the author of the bit_set that @rkaneriya referred to, so we can make the necessary changes if needed :)

Thoughts?

Cheers, Andrew

zoechi commented 10 years ago

Hi Andrew,

thank you very much for explaining this use case! I can now see the requirement.

The Datagrid supports pluggable selection models. The lib/plugin directory already contains a few which can be used as example. I can take a look if this can be done by a custom selection model without any mods to the Datagrid itself.

As mentioned, I'm working on getting all features to work before working on optimization, so this might take some time.

I guess you will benefit from the work for these examples

Cheers Günter

skalkin commented 10 years ago

Hi Günter,

Thanks for the quick answer and for pointing to those examples! I admit I haven't looked into the grid's code in depth. If the custom selection model allows us to use a bitset for internal representation of the row selection state, this is awesome! I will look closer into that.

I share your sentiment regarding the optimization, of course everything has its priority :) At this point of time, I just want to be sure that we will be able to do such a customization in the future. So far, the grid looks very promising.

Thanks and keep up the great work! -Andrew