Cascading / lingual

Stand-alone ANSI SQL for Cascading on Apache Hadoop
http://www.cascading.org/lingual/
48 stars 17 forks source link

Column aliasing in INSERT INTO ... SELECT ... doesn't work in Lingual 1.1 with HBase #22

Open alexanderdean opened 10 years ago

alexanderdean commented 10 years ago

Test script:

#!/bin/bash

# Config
hdfs_path=/local/lingual-alias-test/
hbase_table=out
hbase_col_family=fields
export LINGUAL_PLATFORM=hadoop
export HADOOP_USER_NAME=hadoop

# Test data
printf "\n1\ta\n2\tb\n3\tc\n" > /tmp/alias.tsv
hadoop fs -copyFromLocal /tmp/alias.tsv "${hdfs_path}alias.tsv"

# Lingual 1.1
lingual catalog --init
lingual catalog --provider --add cascading:cascading-hbase:2.2.0:provider
lingual catalog --schema IN --add
lingual catalog --schema IN --stereotype TSVFILE -add \
--columns A,B \
--types   string,string
lingual catalog --schema IN --table IN --stereotype TSVFILE -add "${hdfs_path}" --format tsv
lingual catalog --schema OUT --add
lingual catalog --schema OUT --protocol hbase --add --provider hbase
lingual catalog --schema OUT --stereotype HTABLE -add \
--columns A,B \
--types   string,string
lingual catalog --schema OUT --format hbase --add --properties="family=${hbase_col_family}" --provider hbase
lingual catalog --schema OUT --table OUT --stereotype HTABLE -add "${hbase_table}" --protocol hbase --format hbase --provider hbase

# Works
lingual shell --sql - <<- EOQ
    INSERT INTO "OUT"."OUT"
        select *
        from "IN"."IN";
    SELECT * from "OUT"."OUT";
EOQ

# Doesn't work
lingual shell --sql - <<- EOQ
    INSERT INTO "OUT"."OUT"
        select
            B as A,
            A as B
        from "IN"."IN";
EOQ

Error is:

Warning: exception while executing query: could not build flow from assembly: [[HBaseScheme[['A', 'B' ...][cascading.hbase.HBaseFactory.createScheme(HBaseFactory.java:120)] unable to resolve scheme sink selector: [{1}:'A'], with incoming: [{2}:'$0', '$1' | String, String]]: [HBaseScheme[['A', 'B' ...][cascading.hbase.HBaseFactory.createScheme(HBaseFactory.java:120)] unable to resolve scheme sink selector: [{1}:'A'], with incoming: [{2}:'$0', '$1' | String, String]: could not select fields: [{1}:'A'], from: [{2}:'$0', '$1' | String, String] (state=,code=0)
Error: exception while executing query (state=,code=0)
joeposner commented 10 years ago

The issue isn't related to aliasing of the columns but the combination of swapping the column orders and having a file that doesn't match the definitions (the first line is blank but the table is defined as strictly delimited)

The following code shows this:

#!/bin/bash

# Config
in_table="/local/in/alias.dat" 
out_table="/local/out/results.dat" 
export LINGUAL_PLATFORM=hadoop
export HADOOP_USER_NAME=hadoop

# Test data
printf "A:string,B:string\n1,a\n2,b\n3,c\n" > /tmp/alias.dat
hadoop fs -copyFromLocal /tmp/alias.dat "${in_table}"

# Lingual 1.1
lingual catalog --init
lingual catalog --schema IN --add
lingual catalog --schema IN --stereotype MYTABLE -add \
--columns A,B \
--types   string,string
lingual catalog --schema IN --table IN --stereotype MYTABLE -add "${in_table}" 
lingual catalog --schema OUT --add
lingual catalog --schema OUT --stereotype MYTABLE -add \
--columns A,B \
--types   string,string
lingual catalog --schema OUT --table OUT --stereotype MYTABLE -add "${out_table}"

#this works
lingual shell --sql - <<- EOQ
    INSERT INTO "OUT"."OUT" 
        select
           A as B,
           B as A
        from "IN"."IN";
EOQ

#this fails
lingual shell --sql - <<- EOQ
    INSERT INTO "OUT"."OUT" 
        select
           B as B,
           A as A
        from "IN"."IN";
EOQ

#this fails
lingual shell --sql - <<- EOQ
    INSERT INTO "OUT"."OUT" 
        select
           B as A,
           A as B
        from "IN"."IN";
EOQ
alexanderdean commented 10 years ago

Okay, thanks, let's close.

joeposner commented 10 years ago

Re-opening since my comment was just to clarify the source of the issue and the test case to reproduce.

alexanderdean commented 10 years ago

Ah okay, thanks

joeposner commented 10 years ago

There's an issue with the execution plan that gets created when the order is changed in this INSERT INTO ... SELECT example. This is a bug to resolve but we do not have an ETA for the fix.

alexanderdean commented 10 years ago

np, thanks for letting me know