Closed paneq closed 10 years ago
Here is my current workaround:
Veritas::Relation::Operation::Order::Ascending.class_eval do
def self.call(left, right)
left <=> right || (-1 if right.nil?) || (1 if left.nil?)
end
end
Veritas::Relation::Operation::Order::Descending.class_eval do
def self.call(left, right)
left, right = right, left
left <=> right || (-1 if right.nil?) || (1 if left.nil?)
end
end
Oh, And I checked that the problem occurs also in master
branch
Sorry for not responding to this earlier. I'm still trying to sort out precisely how to handle this.
Officially relational algebra doesn't really provide many ways to handle NULL (nil) values. In fact it's pretty much regarded that NULL values are the result of a mistake at the data modelling phase. I think in reality there are times where it does happen, and we have to make sure we handle it sanely.
The problem is that some operations have undefined behaviour when a NULL value occurs. You can fall back to ruby-like semantics, like that nil won't equal anything other than another nil. But there are other cases where it's not so clear-cut. The worst possible outcome is that you have to rely on 3 valued logic, as opposed to boolean logic, which makes everything significantly more complex.
What I need to do is look at every operation and see if there's a nice way to handle it, or if I have to raise assertions when it occurs.
I totally understand. IMHO we should handle things the same way SQL-databases does. That is what I expected at least. Also, this ticket is not about trying to fix it everywhere. Step by step is better approach. I only ask you to consider fixing it for sort_by
. Thank for writing this gem and for deeply considering this issue.
FWIW I don't think SQL (and SQL database products) specifies a standard semantic for ordering nulls either http://en.wikipedia.org/wiki/Order_by
@ahamid yeah, some databases sort nulls first, some sort them last. Some will default to one, but provide a non-standard way to specify another.
Just handling nil on sorting isn't a proper solution because it pushes the problem elsewhere.
I'm still trying to decide if I'll provide a way to specify the preference, or if I'll just provide a way to specify default values for nil columns. Maybe it could be something like:
relation_with_nil_city.defaults(:city => '').sort_by { |r| [r.city.asc, r.sum.asc] }
Or maybe I should just say to restrict the tuples with a nil
city, and that if you have nil values the behaviour is undefined at the moment.
It doesn't sound like a big deal, but handling of NULL values is probably the largest unsolved issue in relational algebra. The SQL approach is completely out, since makes things impossibly more complex (to the point where I would say no database gets it perfectly right, despite 30+ years of research and billions of dollars in investment). I don't know what the solution is, but I'll need to do more research to see if anyone has a good solution for this.
Everything is better than current exception NoMethodError: undefined method
nonzero?' for nil:NilClass`.
That would be also cool:
relation.sort_by { |r| [r.city.asc.default(""), r.sum.asc] }
Sorry for leaving this open for so long.
I think I'm going to use the default behaviour of PostgreSQL as a guide since it is our reference database on the RDBMS side of things. This means defaulting to sorting nil last.
@paneq thanks for submitting this. Your original use case passes:
relation = Axiom::Relation.new(
[ [ :id, String ], [ :name, String ], [ :color, String ], [ :weight, Float ], [ :city, String ] ],
[
[ 'P1', 'Nut', 'Red', 12.0, 'London' ],
[ 'P2', 'Bolt', 'Green', 17.0, 'Paris' ],
[ 'P3', 'Screw', 'Blue', 17.0, 'Oslo' ],
[ 'P4', 'Screw', 'Red', 14.0, 'London' ],
[ 'P5', 'Cam', 'Blue', 12.0, 'Paris' ],
[ 'P6', 'Cog', 'Red', 19.0, nil ],
]
)
ordered = relation.summarize( relation.project([:city]) ){ |r| r.add(:sum, r.weight.sum) }.sort_by do |r|
[r.city.asc, r.sum.asc]
end.to_a
=> [{:city=>"London", :sum=>26.0},
{:city=>"Oslo", :sum=>17.0},
{:city=>"Paris", :sum=>29.0},
{:city=>nil, :sum=>19.0}]
@dkubb Nice to hear that. Thank you for your work on that.
This leads to an exception:
I think nil values should always be sorted as "greater" than any non-nil value. What do you think ?