activewarehouse / activewarehouse-etl

Extract-Transform-Load library from ActiveWarehouse
MIT License
240 stars 102 forks source link

mysqlstream error: `columns': undefined method `any?' #71

Closed sgrgic closed 12 years ago

sgrgic commented 12 years ago

To reproduce this problem use: :mysqlstream => true, :store_locally => true (which is default value)

Error: /Users/sgrgic/.rvm/gems/ree-1.8.7-2011.03@DWR3/bundler/gems/activewarehouse-etl-04ffaa4bd073/bin/../lib/etl/control/source/database_source.rb:98:in columns': undefined methodany?' for#MySqlStreamer:0x12d9e1050 (NoMethodError)

Environment in this test case: Ruby ree-1.8.7-2011.03 Rails 3.1.0 activewarehouse-etl 1.0.0.rc1 adapter_extensions 1.0.0.rc1

This is working when :store_locally is set to false so write_local and columns methods are not called.

Thanks, Sinisa.

pdodds commented 12 years ago

Will look once I get in...

Thinking something like

Some changes to the MySqlStreamer to allow for the metadata

require 'open3'

class MySqlStreamer

def initialize(query, target, connection)
    @query = query
    @name = target
    @first_row = collection.select_all("#{query} limit 1")
end

def any?
    @first_row.any?
end 

def first
    @first_row.first
end

def each
    puts "Using the Streaming MySQL from the command line"
    keys = nil
    connection_configuration = ETL::Base.configurations[@name.to_s]
    mysql_command = """mysql --quick -h #{connection_configuration["host"]} -u #{connection_configuration["username"]} -e \"#{@query.gsub("\n","")}\" -D #{connection_configuration["database"]} --password=#{connection_configuration["password"]} -B"""
    Open3.popen3(mysql_command) do |stdin, out, err, external|
        until (line = out.gets).nil? do
            line = line.gsub("\n","")
            if keys.nil?
                keys = line.split("\t")
            else
                hash = Hash[keys.zip(line.split("\t"))]
                yield hash
            end
        end
        error = err.gets
        if (!error.nil? && error.strip.length > 0)
            throw error
        end
    end
end

end

_Then a change in the databasesource.rb to handle passing the connection

def query_rows return @query_rows if @query_rows if (configuration[:mysqlstream] == true) MySqlStreamer.new(query,@target,connection) else connection.select_all(query) end end

ghost commented 12 years ago

Hi Philip,

Take a look please for solution I made today: https://github.com/activewarehouse/activewarehouse-etl/pull/72

Thanks, Sinisa.

pdodds commented 12 years ago

The only problem is that originally I put the MySQLStreamer in place to handle very large datasets (where the query would return around 70million rows so I couldn't hold all the data in memory in a ruby hash), if I understand the change you made it is no longer streaming the data it looks like it is loading it all into a hash?

It might just be I need more coffee :)

Cheers

P

On Mar 6, 2012, at 8:01 AM, sgrgic-lumos wrote:

Hi Philip,

Take a look please for solution I made today: https://github.com/activewarehouse/activewarehouse-etl/pull/72

Thanks, Sinisa.


Reply to this email directly or view it on GitHub: https://github.com/activewarehouse/activewarehouse-etl/issues/71#issuecomment-4343853

thbar commented 12 years ago

See #74 for the definite fix (please review!). Closing this one!