datamapper / dm-aggregates

DataMapper plugin providing support for aggregates on collections
http://datamapper.org/
MIT License
16 stars 15 forks source link

Avg returns 0.0 for empty table (dm-1.2.0) #15

Open Ragmaanir opened 13 years ago

Ragmaanir commented 13 years ago

Hi,

when i do

MyModel.avg(:something)

i get 0.0 for an empty table (sqlite3, integer property). I think this is wrong and should return nil instead (SQL AVG returns null). I guess this is caused by the to_f in:

def avg(property, value)
    property.primitive == ::Integer ? value.to_f : property.load(value)
end

in the dm-aggregates/adapters/dm-do-adapter.rb.

A quick workaround i use is to call MyModel.aggregate(:all.count, :something.sum) and compute the average myself.

dkubb commented 13 years ago

In Veritas I also have the average of an empty set return nil, so I agree with this change.

However, I don't use the SQL behaviour as the reference, but rather mathematics. So for example, SQL returns NULL for the SUM of an empty set, where in math it's 0. Also MIN/MAX of an empty set is also NULL in SQL, but in math it's equal to the highest and lowest numbers, respectively, in the domain (the set of allowed numbers for the aggregate function).

Ragmaanir commented 13 years ago

Hm, regarding MAX/MIN, whats the use in returning the maximum of the domain instead of nil? Lets say i have a users-table and users have an age and i want to compute the youngest age. On an empty table the MIN then would be 256 (depending on the datatype or so). In my opinion nil would be a much more intuitive and usable result than the maximum.

Also, that definition of MAX would not make sense for e.g. natural numbers, because that set does not have a top element (but ok, there are no true natural numbers in DBs). What mathematic definition are your refering to? Im just curious because i could not really find one on wikipedia yet that matches your description. E.g. for partially ordered sets it says on wikipedia: "Maximal elements need not exist." Similar for 'greatest element'. It is always an element of the subset, so when the subset is empty there is no max/min element.

dkubb commented 13 years ago

I can probably comment a bit more later, but see this: http://en.wikipedia.org/wiki/Empty_set#Extended_real_numbers

With an empty set the normal approach is to return the identity of the specific function. Average is an exception because it is undefined over an empty set, given that the denominator (the cardinality of the set) cannot be 0.

Here's also an interesting example: http://stat.ethz.ch/R-manual/R-patched/library/base/html/Extremes.html

"The minimum and maximum of a numeric empty set are +Inf and -Inf (in this order!) which ensures transitivity, e.g., min(x1, min(x2)) == min(x1, x2)"

There are also a whole suite of problems that occur with SQL's handling of NULL. It breaks the closure property in many cases, most of the time unknowingly to the end user.

Again I can comment in more detail later, I just wanted to give you something to look at so you could do the research (if you wanted) in the meantime.