amberframework / granite

ORM Model with Adapters for mysql, pg, sqlite in the Crystal Language.
MIT License
296 stars 87 forks source link

support for standard aggregate methods (sum/avg/min/max) #393

Open sam0x17 opened 4 years ago

sam0x17 commented 4 years ago

I have a monkey patch I've been using locally for this for a while. The tricky part is handling types correctly, as Executor::Value and Executor::MultiValue are tricky to work with. In particular, this has to work with timestamps as well as with numeric columns.

The different aggregate functions have some special caveats when it comes to types:

Because of all the type constraints, this may have to have type-specific versions, e.g. min for ints, min_f for floats, min_t for time, etc., and I'd be fine with that.

In Rails these are called minimum / maximum / average / sum.

My expectation would be the ability to do things like:

Widget.where(:created_at, :gt, Time.now - 10.days).avg(:units_sold) # => Float64? Widget.min(:created_at) # => Time? Product.where(cool: true).sum(:coolness) # => Int64 (defaults to 0) Comment.where(deleted_at: nil).max(:downvotes) => Int64?

Possibly we would provide a second argument allowing one to specify a default value, e.g.: User.minimum(:created_at, Time.utc) # => Time (defaults to Time.utc if no records found)

I'm happy to submit a PR for this, just wanted to collect some feedback first.

sam0x17 commented 4 years ago

This is the current monkey patch I am using. Note I had to change the base Value(Model, Scalar) class to accept nil as a default value.

module Granite::Query::Executor
  class Value(Model, Scalar)
    include Shared

    def initialize(@sql : String, @args = [] of Granite::Columns::Type, @default : Scalar = nil)
    end

    def run : Scalar
      log @sql, @args
      # db.scalar raises when a query returns 0 results, so I'm using query_one?
      # https://github.com/crystal-lang/crystal-db/blob/7d30e9f50e478cb6404d16d2ce91e639b6f9c476/src/db/statement.cr#L18

      # raise "No default provided" if @default.nil?

      Model.adapter.open do |db|
        db.query_one?(@sql, args: @args, as: Scalar) || @default #.not_nil!
      end
    end

    delegate :<, :>, :<=, :>=, to: :run
    delegate :to_i, :to_s, :to_t, to: :run
  end
end

macro define_granite_aggregate(name, aggregate, type_union, inner_type_union, default=nil, resolver=nil)
  abstract class Granite::Query::Assembler::Base(Model)
    def {{name.id}}(column : String | Symbol) : Executor::Value(Model, {{inner_type_union}})
      sql = build_sql do |s|
        s << "SELECT {{aggregate.id}}(#{column})"
        s << "FROM #{table_name}"
        s << where
        s << group_by
        s << order(use_default_order: false)
        s << limit
        s << offset
      end

      #if group_by
      #  Executor::MultiValue(Model, {{type_union}}).new sql, numbered_parameters, default: {{default}}
      #else
        Executor::Value(Model, {{inner_type_union}}).new sql, numbered_parameters, default: {{default}}
      #end
    end
  end

  abstract class Granite::Query::Builder(Model)
    def {{name.id}}(column : Symbol | String, default : Time | Int64 | Int32 | Float64 | Nil = nil) : {{type_union}}
      val = assembler.{{name.id}}(column).run
      return default if val.nil?
      {% if resolver %}
      val = val.{{resolver.id}}
      {% end %}
      val
    end
  end
end

define_granite_aggregate(:min_t, :MIN, Time?, Time?)
define_granite_aggregate(:max_t, :MAX, Time?, Time?)
define_granite_aggregate(:sum, :SUM, Int64, Int64 | Int32, 0_i64, :to_i64)
define_granite_aggregate(:min, :MIN, Int64?, Int64 | Int32 | Nil, nil, :to_i64)
define_granite_aggregate(:max, :MAX, Int64?, Int64 | Int32 | Nil, nil, :to_i64)
define_granite_aggregate(:avg, :AVG, Float64?, Float64 | PG::Numeric | Nil, nil, :to_f64)
define_granite_aggregate(:sum_f, :SUM, Float64, Float64 | PG::Numeric, 0_f64, :to_f64)
define_granite_aggregate(:min_f, :MIN, Float64?, Float64 | PG::Numeric | Nil, nil, :to_f64)
define_granite_aggregate(:max_f, :MAX, Float64?, Float64 | PG::Numeric | Nil, nil, :to_f64)
sam0x17 commented 4 years ago

note that this could be extended to work with group by, and that I'm using PG::Numeric when I should be using something more generic

robacarp commented 4 years ago

This may be a situation where the different drivers have different offerings. For example, I know the way that postgres does group by statements is different than the way mysql does it, and that affects the way the select is built.

I like the idea of adding these more advanced aggregates, but I think that historically Granite has been careful about maintaining the top level DSL to be something uniform across the drivers. If people want the full flexibility of their database they have to drop down to a lower level.

sam0x17 commented 4 years ago

Yeah on that note I've been thinking of taking my ideas and making an ORM

damianham commented 4 years ago

Hey Sam just a thought - what about a higher level ORM that delegates all but the aggregate methods to one of the supported ORMs ? Not sure how that would be constructed at the moment but the thought just struck me that a higher level ORM that provided all kinds of additional functionality, like these aggregate methods for example, on top of the existing ORMs might be a good idea.

drujensen commented 4 years ago

The way I envisioned people using Granite to handle aggregate queries is to build a new read only model for each query. Maybe this is too laborious of a solution.

class UserAgeByCountry < Granite::Base
  select "SELECT country, count(age), min(age), max(age), avg(age), sum(age) FROM users GROUP BY country"

  column country : String
  column cnt_age : Int64
  column min_age : Int64
  column max_age : Int64
  column avg_age : Int64
  column sum_age : Int64
end
drujensen commented 4 years ago

what if we create new column types for each aggregate that takes a type <T>?

class UserAgeByCountry < Granite::Base
  table :users

  column country : String, group_by: true
  column cnt_age : Count(:age, Int64)
  column min_age : Min(:age, Int64)
  column max_age : Max(:age, Int64)
  column avg_age : Avg(:age, Int64)
  column sum_age : Sum(:age, Int64)
end
sam0x17 commented 4 years ago

@drujensen I noticed this when I went looking around for how to do this. For my purposes this would be super impractical as there are literally hundreds of places I need to use aggregate methods (e.g. for the stats page in an admin control panel), and usually I am using complex where/scope chains, and then slapping an aggregate onto the end of those. If I had to create custom view models for each of these scoping situations, I would have 10-20 of them for each model.

drujensen commented 4 years ago

Thanks @Blacksmoke16 and @sam0x17 for the feedback. I am torn if an ORM should return random query results or if that should be a separate product since its no longer mapping results to an Object. What this PR is doing is is creating a DSL for SQL and expecting an array of named tuples with no type checking, etc. It's not leveraging the mapping at all. However, I realize this is a much needed feature and see other ORM's doing similar solutions.

Blacksmoke16 commented 4 years ago

If anything you could do something like (untested):

class User < Granite::Base
  ...

  def self.max_age : Int32
    self.scalar("SELECT MAX(age) FROM #{self.table_name}")
  end
end

Or really could generalize it, then you have context which would help with how to set things up.

drujensen commented 4 years ago

@Blacksmoke16 you wouldn't be able to append a where, group_by, limit, order_by to that.

drujensen commented 4 years ago

The Lucky Framework separates the table mapping and the query into two separate objects. Maybe that is something to consider architecturally instead of combining the two concerns into the same object. wdyt?

damianham commented 4 years ago

Could this problem be solved with Virtual Models ? I am not sure if this would work or not but if it did it could be very useful. For instance User.all gives you an array of User records and in the view you can reference a property of the model such as user.email. However if you use #map to modify the data sent to the web client you get an array of NamedTuples and user.email gives 'undefined method'. So what would be cool would be virtual models that provide a view on the underlying model with a different field set (and thus property methods) and also additional methods such as aggregators. Any method not defined in the virtual model would delegate to the underlying model.

I don't want to have virtual model classes, but be able to create them in a single line of code e.g.

users = User.all.map{ |user| VirtualModel(User.class, properties: [name : String, email : String, created_at : DateTime]).new }

then users.to_json only gives name, email and created_at in the result and this works for both json results and also for ecr/slang templates.

For aggregation how about

stats = VirtualModel.all(User.class, properties: [ country : String,
 cnt_age : Int64,
min_age : Int64,
max_age : Int64,
avg_age : Int64,
sum_age : Int64
],
query: "SELECT country, count(age) cnt_age, min(age) min_age, max(age) max_age, avg(age) avg_age, sum(age) sum_age FROM users GROUP BY country"
)
drujensen commented 4 years ago

@damianham I think what your proposing is pretty much the same as what I proposed, correct? Curious why you wouldn't want to declare it as a class instead. You can leverage the query builder if you use the existing model that sits on top of your query instead.

sam0x17 commented 4 years ago

@drujensen side note: is it possible to apply a .scalar like what @Blacksmoke16 used above but to a where chain? Because if so my life becomes a bit easier

damianham commented 4 years ago

@damianham I think what your proposing is pretty much the same as what I proposed, correct? Curious why you wouldn't want to declare it as a class instead. You can leverage the query builder if you use the existing model that sits on top of your query instead.

I was just throwing the idea around to see what your thoughts were. I think it is similar to what you proposed except you don't populate your project with lots of classes, instead there is a single virtual model class that is part of the framework and you instantiate virtual instances inside your controllers.

I did something similar to this a long time ago in RoR where I used method_missing to get fields from an attribute hash that was populated with columns from the result set record.

drujensen commented 4 years ago

Thanks @damianham I understand. So your idea is to have more of a smart object that can dynamically generate the properties. Ruby is good at that stuff but this might be tough to pull off using macros. I personally prefer having well defined objects for queries, but I see how you may just want to grab the max(:age) without creating a whole class and cluttering the code base.

@sam0x17 There is an executor that runs for scalar values: https://github.com/amberframework/granite/blob/0db7835eece5edd143c5b549069e19de68789ba5/src/granite/query/executors/value.cr#L8 I don't recall when this executor is invoked vs the list executor.

sam0x17 commented 4 years ago

The biggest difficulty is if you have a complex scope/query chain, and you need to grab a max or an avg off of that.

sam0x17 commented 4 years ago

So here's a real scenario where not having these makes things extremely difficult:

Let's say I have a User model with an Int32 field called upvotes, a country field, a state field, and a city field, and I want to know the average number of upvotes for users in a particular city, state, and country, created in January...

class User < Granite::Base
  connection pg
  table users

  column id : Int64, primary: true
  column email : String
  column password_hash : String
  column upvotes : Int32
  column city : String?
  column state : String?
  column country : String?

  timestamps
end

I would expect to just be able to do something like:

User.where(:created_at, :gt, Time.utc - 4.months)
  .where(:created_at, :lt, Time.utc - 3.months)
  .where(country: "US")
  .where(state: "MD")
  .where(city: "Baltimore")
  .avg(:upvotes)

This is an incredibly common sort of construction for people building web apps, and in particular for people coming from Rails where it is easy to chain aggregates onto complex where chains.

The current proposed solution is to make a custom version of the model (sort of like a materialized "view") that houses a custom select clause. Something like UsersBetweenDateInSpecificLocationView....... This would be sort of OK if I could just have that custom view inherit from my base granite model, but inheritance doesn't work in Granite, so I have to completely recreate the model with all the fields involved in the query just to get the answer to some aggregate.

One solution that would work would be the ability to have multiple custom select clauses per model, like allow a name to be passed along with the macro, so you get named custom select scopes, though I'd still prefer to just have working chainable aggregates