cloudera / impyla

Python DB API 2.0 client for Impala and Hive (HiveServer2 protocol)
Apache License 2.0
731 stars 248 forks source link

Allow skipping utf8 converison in Python3 #543

Closed csringhofer closed 1 month ago

csringhofer commented 4 months ago

Ran some benchmarks on local Impala dev environment: Query: select * from tpch_parquet.lineitem" returns 6 million rows (numbers, strings)

ClientFetchWaitTimer looks like this: Impyla (Python2, fetchcolumnar()) 7s967ms Impyla (Python3, fetchcolumnar()): 14s163ms Impyla (Python2, fetchmany(1024)): 17s791ms Impyla (Python3, fetchmany(1024)): 23s816ms

Based on cProfile the Python2 vs 3 differences come from _convert_strings_to_unicode().

Adding a cursor option like "convert_strings_to_unicode" similarly to "convert_types" would be useful.