bloom-lang / bud

Prototype Bud runtime (Bloom Under Development)
http://bloom-lang.net
Other
854 stars 60 forks source link

Wrong output for 3-way join #313

Closed neilconway closed 11 years ago

neilconway commented 11 years ago

Test case:

require "rubygems"
require "bud"

class CombosTestCase
  include Bud

  state do
    scratch :explicit_tc, [:from, :to]
    scratch :use_tiebreak, [:from, :to]
    scratch :sem_hist, [:from, :to]
    scratch :result, [:from, :to]
  end

  bloom do
    result <= (sem_hist * use_tiebreak * explicit_tc).combos(sem_hist.from => use_tiebreak.from, sem_hist.to => explicit_tc.from, sem_hist.from => explicit_tc.to) do |s,t,e|
      puts "JOIN RESULT: sem_hist: #{s}, use_tiebreak: #{t}, explicit_tc: #{e}"
      puts "sem_hist.from: #{s.from}; use_tiebreak.from: #{t.from}"
      [s.to, t.to]
    end
  end
end

c = CombosTestCase.new
c.explicit_tc <+ [[1, -Float::INFINITY]]
c.sem_hist <+ [[-Float::INFINITY, 1]]
c.use_tiebreak <+ [[1, 2]]
c.tick

puts c.result.to_a.sort.inspect

Expected results: empty array Observed results:

JOIN RESULT: sem_hist: [-Infinity, 1], use_tiebreak: [1, 2], explicit_tc: [1, -Infinity]
sem_hist.from: -Infinity; use_tiebreak.from: 1
[[1, 2]]
neilconway commented 11 years ago

Ah, interesting. The problem is the following:

  1. The join predicates are given using the Ruby hash literal notation
  2. The predicate names are method calls that return an array [tbl_name, column_offset, column_name]
  3. Two join predicates have the same table name + column name on the LHS (e.g., sem_hist.from => x, sem_hist.from => y). This results in constructing a hash literal with two values for the same key, so Ruby just preserves the last key/value pair (sem_hist.from => y).

One kludgey way to fix the problem would be to have the schema accessors return an additional array element containing a unique ID (e.g., an incremented counter). If anyone has suggestions for a less gross fix, let me know...

neilconway commented 11 years ago

Another possible fix would be to rewrite the hash literal into an array literal in the rewriter. Not sure if that is more or less ugly, though...