BradRuderman / pyhs2

MIT License
207 stars 108 forks source link

can't execute sql containing ' or " sign #9

Closed superChing closed 10 years ago

superChing commented 10 years ago

like the following

sql=r"select 'qq' from test" cur.execute(sql)

I always get the error: pyhs2.error.Pyhs2Exception: 'Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask'

But I can execute it directly in hive CLI.

BradRuderman commented 10 years ago

Ok I will work on this next. THanks for reporting.

BradRuderman commented 10 years ago

Not sure if I can repo this error. I am using hive 10, and python 2.7.6. What are you using?

hive (bruderman)> desc test;
OK
col_name    data_type   comment
a   string
Time taken: 0.182 seconds
hive (bruderman)> show create table test;
OK
createtab_stmt
CREATE  TABLE test(
  a string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://localhost/user/hive/warehouse/bruderman.db/test'
TBLPROPERTIES (
  'transient_lastDdlTime'='1391451991')
Time taken: 0.247 seconds
hive (bruderman)> desc extended test;
OK
col_name    data_type   comment
a   string

Detailed Table Information  Table(tableName:test, dbName:bruderman, owner:bruderman, createTime:1391451940, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:a, type:string, comment:null)], location:hdfs://localhost/user/hive/warehouse/bruderman.db/test, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{transient_lastDdlTime=1391451991}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)
Time taken: 0.088 seconds
hive (bruderman)> select * from test;
OK
a
a
a
a
a
a
a
a
a
a
a
Time taken: 0.146 seconds

And the python code:

Python 2.7.6 (default, Nov 12 2013, 18:36:16)
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyhs2
>>> with pyhs2.connect(host='localhost',
...                    port=10000,
...                    authMechanism="PLAIN",
...                    user='bruderman',
...                    password='abc',
...                    database='bruderman') as conn:
...     with conn.cursor() as cur:
...             sql=r"select 'qq' from test"
...             print(cur.execute(sql))
...             print cur.fetch()
...
None
[['qq'], ['qq'], ['qq'], ['qq'], ['qq'], ['qq'], ['qq'], ['qq'], ['qq'], ['qq']]
superChing commented 10 years ago

I use Hive 0.11.0 and python 2.7.6. I don't have Hive 0.10 to test.

I create the table as follows: CREATE TABLE test(a string);

and execute the python as follows:

Python 2.7.6 (default, Feb  5 2014, 09:45:17) 
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyhs2
>>> with pyhs2.connect(host='192.168.6.10',
...                    port=10000,
...                    authMechanism="PLAIN",
...                    user='hive',
...                    password='hive',
...                    database='default') as conn:
...     with conn.cursor() as cur:
...         sql=r"""
...         select 'qq' from test
...         """
...         cur.execute(sql)
... 
Traceback (most recent call last):
  File "<stdin>", line 11, in <module>
  File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pyhs2/cursor.py", line 53, in execute
    raise Pyhs2Exception(res.status.errorCode, res.status.errorMessage)
pyhs2.error.Pyhs2Exception: 'Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask'

If I execute it in hive CLI, it success.

By the way ,for pyhs2.error.Pyhs2Exception can you report the full error traceback as does in Hive CLI, not merely one error statement. I found hive error statement is not very clear in itself alone.

BradRuderman commented 10 years ago

Can you please provide the hive server 2 out and log? I am not able to repo:

hive (default)> CREATE TABLE test(a string);
OK
Time taken: 0.063 seconds
hive (default)>
>>> import pyhs2
>>> with pyhs2.connect(host='localhost',
...                    port=10000,
...                    authMechanism="PLAIN",
...                    user='hive',
...                    password='hive',
...                    database='default') as conn:
...     with conn.cursor() as cur:
...         sql=r"""
...         select 'qq' from test
...         """
...         cur.execute(sql)
...

>>>

I will look into error messages but I might be bound by what hive server 2 actually reports.

superChing commented 10 years ago

I am via Ambari to install hive 0.11. There's no additional hive-server2.log output nor hive.log message if I execute the code in python. If I execute it in hive cli and succeed, it do logs in hive.log. Is I mis-configuring something for logging?

BradRuderman commented 10 years ago

There is a configuration parameter:

hive.log.dir=/var/log/hive

This should yield 2 files:

hive-server2.out
hive-server2.log

I can see that parameter set as an argument of the running service. Perhaps check your initialization scripts.

hive      3735     1  0 Feb06 ?        00:09:47 /usr/java/jdk1.6.0_31/bin/java -Xmx2048m -Djava.net.preferIPv4Stack=true -Dhive.log.dir=/var/log/hive -Dhive.log.file=hive-server2.log -Dhive.log.threshold=INFO -Dmapreduce.job.counters.limit=2400 -Dhadoop.log.dir=/usr/lib/hadoop/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/lib/hadoop -Dhadoop.id.str= -Dhadoop.root.logger=INFO,console -Djava.library.path=/usr/lib/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Xmx268435456 -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar /usr/lib/hive/lib/hive-service-0.10.0-cdh4.4.0.jar org.apache.hive.service.server.HiveServer2
superChing commented 10 years ago

Thanks much,I see the log and found the error is raised from hbase-handler is not found in distributed caches. I think myself mis-configured somewhere and the error is not related to pyhs2.

BradRuderman commented 10 years ago

Thanks for following back up.