edjames / pivot_table

Transform an ActiveRecord-ish data set into pivot table
MIT License
103 stars 26 forks source link

how to handle multiple values in a cell #6

Closed henrydjacob closed 10 years ago

henrydjacob commented 10 years ago

I have data set which has multiple data for a row, column combination. i think currently, it ignores the second value. any solution?

edjames commented 10 years ago

This is similar to this issue: https://github.com/edjames/pivot_table/issues/5

Data aggregation is beyond the scope of this gem, so you will need to aggregate the data when you are building your dataset. Adding data aggregation will add too much complexity to this gem I'm afraid, so I don't have plans to do that.

Best, Ed.

henrydjacob commented 10 years ago

my requirement is slightly different, i don't want data aggregation, but require both the values, so that I can display them together in a cell

edjames commented 10 years ago

Ah right, I understand. I'd be happy to consider a pull-request, but right now I don't have any time to implement this (I also have no need to implement it in the app for which this gem was built).

Thanks.

javidjamae commented 9 years ago

FWIW - I think the reason you're getting people raise issues like #5 and #6 is because the name of your gem is misleading. The entire point of a pivot table is to group and aggregate data with similar attributes. It took me an hour of playing around with it to understand that it does not in fact behave like a pivot table. Maybe you should rename it to "map_to_table" or "object_to_matrix" and make it clear in the documentation that it will only take the first match on the attributes you specify. Either way, I would recommend renaming it to something more indicative of what it actually does if you consider aggregation out of scope.

edjames commented 9 years ago

Hi, thanks for commenting.

I disagree that the entire point of a pivot table is to "group and aggregate data". Yes, it's one thing that you could do but it's more about the display of data in a cross-tabular format to make it easy to understand multi-dimensional data i.e. data that "pivots" around one of the dimensions.

I extracted this gem from a private repo in the interests of sharing. As the README states, it's a simple little gem which tries to do one thing well. MySql doesn't easily allow a cross-tabular format of query results, which is the specific need this gem fulfils for me.

If you feel the docs are unclear, I'd be more than happy to accept a pull request with improved docs. However, I will not be renaming the gem and nor do I want to build a data aggregation engine. I have no need for that in this gem and it's something that any database engine will do infinitely better than any gem.

Thanks, Ed.

javidjamae commented 9 years ago

@edjames - yes, assuming you have a database. In my case, for example, I had a CSV file that I needed to parse and do summarization on. I don't think this is an uncommon case, and I don't feel like I'm "doing something wrong" because I don't even have a database in the picture. There are plenty of reasons one would want to use a pivot table gem with summarization capabilities without hitting a database (streaming inputs / real-time aggregation, flat file processing, etc).

Even with database applications, depending on the number of dimensions, segments, types of summary data, etc, it could be way more performant to pull an entire set of data back and do your summarization in a background process rather than slamming your database with a myriad aggregation queries.

Either way, thanks for updating the docs.

For anybody else who runs across this and want full pivot table functionality, you might look in to the ruport gem, which handles summarization as a strategy (only on master for now, not in a versioned gem yet):

https://github.com/ruport/ruport/blob/master/lib/ruport/data/table.rb#L174-L180

edjames commented 9 years ago

Hi @javidjamae

You make a fair point regarding alternative use cases. However, if you're trying to aggregate a dataset of any significance - say at least 500k rows - then using Ruby to do aggregation is going to present problems many other tools will not. Ruby is slow, and doing this kind of thing will eat up CPU and memory resources fast. You also have to deal with the single-threaded nature of Ruby (depending on your flavour) or the added complexity of running several background processes. It would seem you're then heading into Map/Reduce territory, which again adds its own complexities and challenges.

In the case of real-time aggregation, there are many better tools than Ruby to handle this at scale.

For processing a large, static text/CSV file, in my experience it would be quicker and simpler to bulk load into a database and run a simple query for aggregation. Good indexing can make this lightning quick. Then again, I don't really understand your use case in any detail, but this seems a cheap and robust workflow to me.

I feel your earlier criticism is a bit unfair though. My gem is very simple and lightweight, and it tries to do one specific thing well. In my opinion this is clear. It serves my needs perfectly hence my reluctance to develop it much further. If you feel the docs are incomplete and/or misleading, or you feel like there is an important feature that is missing, I would happily accept a pull request.

Thanks.

javidjamae commented 9 years ago

@edjames - No, my dataset is only 20K row. It's time-series data that is already pre-aggregated down to the hour (along with a few other dimensions), but I need to aggregate in different ways (3h, 6h, 12h, etc) for aggregate reporting purposes.

Sorry if I came across as overly critical, my intention wasn't to insult your efforts. I commend you for writing something useful and putting it out there for others to benefit from. I totally appreciate that it does what you need and that you're keeping it simple. My only criticism was that the name implies aggregation, which isn't there, so it violates the principle of least surprise for somebody that stumbles across the gem and tries to evaluate it as I did. The updated docs help define what the limitations are, which will save people time in the future (which is all I was really criticizing / suggesting).

WRT a pull request: you've taken a strong stance on not supporting summarization, but then you're inviting me to do a pull request to write that functionality, which is a little confusing. I currently have what I need with the ruport gem, so I have little incentive to improve pivot_table to have aggregation, especially if it is clearly out of scope and I would end up with an orphaned fork. That being said, I'd be happy to know if you or someone else decides to add support for summarization in the future.

Best of luck, and thanks again for your efforts!

edjames commented 9 years ago

Ok, understood.

Regarding data aggregation and this gem, you're right, I'm not making my position entirely clear. So to clarify, aggregation is out of scope for this gem in so far as I personally will not be implementing that feature. However, if someone implements this in an elegant way then I will happily consider a pull request to enhance the gem.

Thanks for the conversation and hopefully you've found the solution you're looking for.

Best, Ed.