Support for data aggregation?

artem-mindrov commented 12 years ago

I'm looking for a way to dynamically aggregate calculated column data based on values in other columns (basically, group-by functionality with a way to hide unused columns). In the current implementation, I don't see one. Do you think it's possible? If it's not supported, do you want me to try doing this and send you a patch (in this case, I'd appreciate any clues to help me get started)?

bogdan commented 12 years ago

At first take a look at this report:

http://datagrid.herokuapp.com/time_entry_reports

In short currently you need to do aggregation yourself at SQL level. Datagrid is just take care about output. Take a closer look on Column Value section at https://github.com/bogdan/datagrid/wiki/Columns

If you think that it doesn't do what you need or doesn't do it in a way you want, lets discuss an API you want to have. Give an ruby code example of a gird that should solve your problem.

artem-mindrov commented 12 years ago

That was fast, thanks. Yes, I've seen the example and read the wiki, so I was suspecting I'd need different queries for each aggregation level, so wanted to confirm.

My use case is actually a basic financial report showing incomes/expenses for a given time period. These values are calculated/fetched for

clients
clients and their services as an additional detail level There are a few other detail levels I don't mention for simplicity (e.g. a service might have a 'sub-service' so a [client,service,sub-service] tuple is also a valid aggregation level).

I haven't given much thought to the API at this point, but maybe it could look something like the below sample?

class Report
  include Datagrid

  scope do
     # select all the rows with the max possible detail level, i.e. client_id, service_id, income and expense values
  end

  column(:client)
  column(:service)
  column(:income)
  column(:expense)
end

Then, given an instance of Report, I'd do something like

report.aggregate([:income, :expense], :by => [:client, :service], :with => :sum)

or

report.sum(:income, :by => [:client,:service])

The only problem here is how to hide the columns I don't need to show up in this aggregated report (e.g. if I had a sub-service column in this report, apparently I wouldn't need it in the output). I guess I'll have to tell this instance which particular columns I want to hide/show, or maybe tweak my view not to use the helpers so I might skip those columns.

bogdan commented 12 years ago

A problem of hiding columns is something separated from aggregation and something more clean to me as I met that problem myself.

I want to be able specify which columns I would like to display in particular context. This could be solved with an extra option for datagrid_table like

datagrid_table(@grid, @assets, :columns => [:client, :service])

This should be good first step for columns customization. What do you think?

artem-mindrov commented 12 years ago

I agree that in general it's a different issue, I was only thinking about it in the context of my use case... What you suggest about this seems like a good idea to me.

bogdan commented 12 years ago

You can PR this change if you want it now. Otherwise I'll tackle it later. Maybe on the beginning of next week.

As for aggregation - I have some doubts. The API you proposed doesn't look solid. Why you can not implement aggregation with a filter if it is so general?

artem-mindrov commented 12 years ago

I have my own doubts about this :) I don't see a way to do it with filters other than re-scoping, so I'd have to have separate queries for each aggregation level. My code will bloat. Did you have something different in mind?

bogdan commented 12 years ago

Can you show me those queries?

artem-mindrov commented 12 years ago

The most detailed one looks like this

    PaymentDetail.select(%Q{groups.client_id, groups.subclient_id, groups.id as group_id, groups.user_id,
                          tools.service_id, payment_details.tool_id, sum(payment_details.total) as income,
                          sum(payment_details.loss) as expense, sum(payment_details.profit) as profit,
                          payments.payment_date as p_date}).joins([ { :payment => :group }, :tool]).
        where("payments.status=100").group("client_id,subclient_id,group_id,user_id,service_id,tool_id,p_date")

So the report can be drilled down to 6 levels (with p_date being a mandatory group column).

bogdan commented 12 years ago

I assume that things that changes between diffent aggretion options are list of group columns and list of select columns.

Did you try something like this:

scope {PaymentDetail.joins(...).where(...)

filter(:aggregation, :enum, :select => [...]) do |value, scope, grid|
  scope.select(grid.active_columns(value)
  .group(grid.active_groups(value)

end

def active_columns(value)
  DEFAULT_COLUMNS + case value
  end

def active_groups(value)
  DEFAULT_GROUPS + case value
  end
end

# And in view

datagrid_table(grid, assets, :columns => grid.active_columns)

artem-mindrov commented 12 years ago

Ah, I see you've made the helper accept :columns, thanks for that. I'll try your suggestion and get back.

artem-mindrov commented 12 years ago

Although it didn't quite work as you suggested (because chained group on an AR Relation appends group values instead of replacing them), I've tried recreating the scope in the filter, and it seems to work for me. Thanks for your help.

bogdan commented 12 years ago

Can you show me the result code you use in production(with no simplification)?

I want to check if we can to create a kind of helper or at least update the doc with this use case. You can send me a email if you don't want to share code in public. I ll use it for good not for evil

artem-mindrov commented 12 years ago

I've just mailed you the model code. It really could use some refactoring, but hope it can still be of use.

iwiznia commented 10 years ago

A little bit late to the discussion, but I'm needing the same functionality.... How did you implemented it? I was thinking in adding some dsl to the current implementation to do this, but I;m not sure the best way to tackle it. I was thinking something like:

groups(:some_field) do
   column(:aggregated_column, :sum) # the second param can be :sum, :avg, :count, etc
   column(:hidden_column, nil)
end

So, this would add some a combo or radio button with all the possible grouping values. When one is selected it will:

add a .group(:some_field) to the scope.
add a .select("sum(aggregated_column) aggregated_column") to the scope.
hide the hidden_column from the grid table.
all other columns that are not defined inside the groups block, will be shown as usual.

What do you think about it?

bogdan commented 10 years ago

@iwiznia Can you be more concrete on the use case? Do you group by foreign key?

iwiznia commented 10 years ago

Not necessarily you could group by any field. For example:

class Report 
  include DataGrid
  scope { User }

  column(:name)
  column(:type)
  column(:login_count)
  column(:total_orders)

  groups(:type) do
    column(:name, nil)
    colimn(:login_count, :avg)
    colimn(:total_orders, :sum)
  end
end

Here we could group by user type, when doing so, the name column will be hidden. In the login_count and total_orders columns we would get the average logins for each user type and the total orders of each user type.

bogdan commented 10 years ago

Ok, I am starting to understand your case.

Should the groups statement automatically define a filter? If yes- which type and options? If no - please define it in grid, so that we have a full picture.

Also can you demonstrate the usages with grid console api like assets, header and data.

iwiznia commented 10 years ago

I wouldn't mix the grouping and filtering capabilities. If you want, you can define a filter and a group, but that shouldn't be forced by the gem. The example I put is valid, and if you wanted you could add a filter by type.

A few usages: Report.new(groups: :type).assets would return the assets of the scope:

User.group(:type).select("#{User.quoted_table_name}.*, avg(login_count) login_count, sum(total_orders) total_orders")

Report.new(groups: :type).header would return:

["Type", "Login count", "Total orders"]

Report.new(groups: :type).rows would return:

[
  ['admin', 6, 125],
  ['user', 2, 500]
]

I don't know if that's what you where asking...

artem-mindrov commented 10 years ago

Sorry I'm a bit late to the party. Took me some time to find the code (I haven't been involved in that project for quite a while by now). Turned out my solution was rather crude really, I'll post whatever is relevant below:

class PlanReport
  COLUMNS = [:client_id, :subclient_id, :team_id, :group_id, :service_id, :tool_id]
  DEFAULT_COLUMNS = [:p_date, :income, :expense, :profit]

  include Datagrid

  self.scope do
    self.query_for_scope("incomes.date", self.group_string(:tool_id))
  end

  def initialize(*args, &block)
    super(*args, &block)

    params = args.shift

    self.detail_level, self.drilldown = if params.is_a?(Hash)
      [params[:detail_level] || :monthly, params[:aggregation].to_sym || :tool_id]
    else
      [:monthly, :tool_id]
    end

  end

  def detail_level
    @detail_level || :monthly
  end

  def detail_level=(level)
    raise ArgumentError unless level.is_a?(Symbol) || level.is_a?(String)

    @detail_level = level
    @p_date = case level.to_sym
      when :monthly then "date_format(incomes.date, '%Y-%m-01')"
      when :weekly then "adddate(incomes.date, INTERVAL 2-DAYOFWEEK(incomes.date) DAY)"
      else "incomes.date"
    end

    set_scope
  end

  def drilldown
    @drilldown || :tool_id
  end

  def drilldown=(col_name)
    @drilldown = col_name
    @group_by = group_string(col_name)

    set_scope
  end

  def active_columns
    columns = group_string(drilldown)
    DEFAULT_COLUMNS | columns | columns.map { |col| col.to_s.gsub(/_id\Z/, '_name').to_sym }
  end

  filter(:aggregation, :enum,
         :header => "Детализировать по",
         :select => [ ["клиентам", :client_id], ["субклиентам", :subclient_id], ["группам", :team_id], ["направлениям", :group_id],
                      ["услугам", :service_id], ["инструментам", :tool_id] ],
         :include_blank => false, :default => :tool_id) do |value, scope, grid|
    grid.drilldown = value.to_sym
    grid.scope
  end

<billions of filter and column definitions skipped>
...

  private

  def set_scope
    p_date = @p_date || "incomes.payment_date"
    grouping = @group_by || group_string(:tool_id)

    self.scope do
      query_for_scope(p_date, grouping)
    end
  end

  def self.group_string(col_name)
    COLUMNS[0..Hash[COLUMNS.map.with_index{|*ki| ki}][col_name]] | [:p_date]
  end

  def group_string(col_name)
    self.class.group_string(col_name)
  end

  def self.query_for_scope(p_date, grouping, drilldown = :tool_id)
    fields = case drilldown

      when :client_id
        %Q{<sql select part here>}

      when :subclient_id
        %Q{<a more specific sql select part>}

      when :group_id
        %Q{<an even more specific sql select part>}

      when :team_id
        %Q{<you get the drill...>}

      when :service_id
        %Q{<...>}

      else
        %Q{<the most detailed select ever!>}
      end

    qstring = fields + ", #{p_date} as p_date"
    q = Income.select(qstring).joins('<a join clause for every case>')

    inner_joins = case drilldown
                    when :client_id, :subclient_id, :group_id
                      { :payment => [:payment_details, :group] }
                    when :service_id, :tool_id
                      { :payment => [{:payment_details => { :tool => :service }}, :group] }
                  end

    q.joins(inner_joins).merge(Payment.reportable).group(grouping)
  end

  def query_for_scope(p_date, grouping)
    self.class.query_for_scope(p_date, grouping, drilldown)
  end
end

Basically aggregation here is two dimensional (monthly/weekly/daily + metrics listed in COLUMNS) and is set with attributes detail_level (time specificity) and drilldown. Both attributes are set by the user as dropdown choices. The resulting SQL is then returned by query_for_scope depending on those two values. I understand this might be a mess so ask away, I'll try to answer.

iwiznia commented 10 years ago

mmm, what you are solving there is 2 different things... the aggregation, and the aggregation of dates by month, week, day (or aggregating by an operation in a field). I wouldn't mix the two of them, in my proposal if you wanted to aggregate a column in different ways (by month or day) you would define 2 groups, like so:

groups(:created_at_day, group: -> { group_by_day(:created_at} ) do
   # columns config
end
groups(:created_at_monthly, group: -> { group_by_month(:created_at} ) do
   # columns config
end

I'm using the https://github.com/ankane/groupdate gem to get the group_by_month and group_by_day scopes in active record, but the same could be achieved without that gem.

bogdan commented 10 years ago

@iwiznia comming back to your example:

  groups(:type) do
    column(:name, nil)
    colimn(:login_count, :avg)
    colimn(:total_orders, :sum)
  end

I see some separated features here.

Define column value as SQL

At first ability to count column value using SQL.

In common case it can look like: column(:total_orders, "sum(total_orders)". I am not sure how you gonna use it like column(:total_orders, :sum) because there is no way to specify column. I would prefer to not think about such magic at least in Version 1 of this feature.

These syntax can be used even if you aggregate by joined table, e.g. in my App I have:

Merchant.has_many :purchases

scope {Merchant.join(:purchases).group("merchants.id") }
column(:purchases_total, "sum(purchases.subtotal)")

Hide column using a condition

There is a need to hide column based on some condition. The most flexible way of doing it is:

column(:type, :available => proc {|grid| grid.some_filter == some_value }

OR more elegant:

column(:type, :available => :without_grouping_by_xx?)

def without_grouping_by_xx? # method name should reflect business logic
  some_filter == some_value # or other condition
end

We can support both and let people to choose.

Group with block is doubtful

Features above is something simple that every person can understand. I don't think we really need those group with block API. People can express what they want just with api above.

If you still want some kind of grouping:

with_options(:available => :without_grouping_by_xx?) do |z|
  z.column(:one)
  z.column(:two)
end

http://apidock.com/rails/Object/with_options

bogdan commented 10 years ago

@artem-mindrov yeah, you have really over complected everything. Many of thing you've done doesn't relate to datagrid. They could be done easier just by refactoring.

If you still have stamina to work with us on this feature, read my post above and tell me if it solve your problems.

iwiznia commented 10 years ago

I like what you propose. Much simpler than mine, and less 'magical' as you said. I would name hidden the option to hide a column, instead of available (but, that's a matter of taste).

I like the sql in the column options, since can can lead to speed improvements if you have something like:

scope { Purchase.joins(:merchant) }
column(:merchant_name, 'merchants.name')

before this change it would have to be something like:

scope { Purchase.joins(:merchant).includes(:merchant) }

which would add a lot more unnecessary data to the query :-)

One thing that I'm not sure about is that the column definition can change the query made by the grid. This is not a problem, but something that differs from the current implementation, where the columns definitions are only used for displaying the data and do not modify the retrieval of the rows... Do you think that can be confusing or is it ok?

bogdan commented 10 years ago

available option

In my app there is more than 20 columns in some grids and users are able to select which columns they want to see as a list of checkboxes - one for each column. So for me hidden means that user has chosen to hide a column. And available means that column is available to be selected in this list.

That's why I've picked available. Maybe we can consider unavailable but still not sure. We need to imagine how this option can be used outside of grouping context.

I only have something like this in my apps:

class UsersGrid
  attr_accessor :current_user
  column(:potential_spammer, :available => proc {|g| g.current_user.admin?})
end

unavailable is only good for grouping use case. I still feel that available sounds better. Can you imagine more use cases?

Fivell commented 10 years ago

just few cents from stranger , why not simplify use both :if and :unless options ? (instead of available )

bogdan commented 10 years ago

@Fivell Yes, I think you are right. It will be fine to make both if and unless. It is shorter and more common for rails users.

One thing that I'm not sure about is that the column definition can change the query made by the grid. This is not a problem, but something that differs from the current implementation, where the columns definitions are only used for displaying the data and do not modify the retrieval of the rows... Do you think that can be confusing or is it ok?

I don't see a way to avoid this. In current version:

scope do 
  Merchant.join(:purchases).group("merchants.id").
     select("merchants.*, sum(purchases.subtotal) purchases_total")
end
column(:purchases_total)

I consider this to be worse than what we are trying to build.

iwiznia commented 10 years ago

Yes, I agree. I'll build this and make a pull request as soon as I can. Thanks!

bogdan commented 10 years ago

@iwiznia I would be cool to have two PRs: one for if unless and one for SQL column value.

iwiznia commented 10 years ago

Ok, no problem.

bogdan commented 10 years ago

@iwiznia any progress on this?

We need it in our project and if you don't have enough time - I can do this myself.

iwiznia commented 10 years ago

No progress, sorry. I'm pretty swamped right now and we pushed off that functionality...

bogdan commented 10 years ago

Available in latest release

bogdan commented 10 years ago

AVailable in latest release

bogdan / datagrid