colinmarc / impala-ruby

an impala client for ruby
MIT License
34 stars 22 forks source link

value of string column that contains tabs may cause errors #26

Open lingzhuzi opened 5 years ago

lingzhuzi commented 5 years ago

Hi all: I met an error recently, my sql is: select * from my_table where ymd >= 20181128 and ymd <= 20181129, and the error is:

ArgumentError: no time information in "xxxxxx"
  from /Users/yy/.rvm/rubies/ruby-2.1.5/lib/ruby/2.1.0/time.rb:327:in `parse'
  from /Users/yy/.rvm/gems/ruby-2.1.5@my_project/gems/impala-0.5.1/lib/impala/cursor.rb:135:in `convert_raw_value'
  from /Users/yy/.rvm/gems/ruby-2.1.5@my_project/gems/impala-0.5.1/lib/impala/cursor.rb:109:in `block in parse_row'
  from /Users/yy/.rvm/gems/ruby-2.1.5@my_project/gems/impala-0.5.1/lib/impala/cursor.rb:108:in `each'
  from /Users/yy/.rvm/gems/ruby-2.1.5@my_project/gems/impala-0.5.1/lib/impala/cursor.rb:108:in `parse_row'
  from /Users/yy/.rvm/gems/ruby-2.1.5@my_project/gems/impala-0.5.1/lib/impala/cursor.rb:95:in `block in fetch_batch'
  from /Users/yy/.rvm/gems/ruby-2.1.5@my_project/gems/impala-0.5.1/lib/impala/cursor.rb:95:in `map'
  from /Users/yy/.rvm/gems/ruby-2.1.5@my_project/gems/impala-0.5.1/lib/impala/cursor.rb:95:in `fetch_batch'
  from /Users/yy/.rvm/gems/ruby-2.1.5@my_project/gems/impala-0.5.1/lib/impala/cursor.rb:82:in `fetch_more'
  from /Users/yy/.rvm/gems/ruby-2.1.5@my_project/gems/impala-0.5.1/lib/impala/cursor.rb:40:in `fetch_row'
  from /Users/yy/.rvm/gems/ruby-2.1.5@my_project/gems/impala-0.5.1/lib/impala/cursor.rb:26:in `each'
  from /Users/yy/.rvm/gems/ruby-2.1.5@my_project/gems/impala-0.5.1/lib/impala/cursor.rb:51:in `to_a'
  from /Users/yy/.rvm/gems/ruby-2.1.5@my_project/gems/impala-0.5.1/lib/impala/cursor.rb:51:in `fetch_all'
  from /Users/yy/.rvm/gems/ruby-2.1.5@my_project/gems/impala-0.5.1/lib/impala/connection.rb:69:in `query'

here is my table info:

name type
ymd int
content string
create_time timestamp

Finally I found out that this error was caused by the value of content column: the value includes a TAB ! so the content value was split into two parts, and the "convert_raw_value" method try to parse the second part to the type of next column -- timestamp, and caused error.

Thanks !