cloudera / impyla

Python DB API 2.0 client for Impala and Hive (HiveServer2 protocol)
Apache License 2.0
731 stars 248 forks source link

Support Cursor.rowcount and close finished queries #528

Closed csringhofer closed 10 months ago

csringhofer commented 10 months ago

With current Impala server rowcount support needs DMLs to be closed with CloseImpalaOperation() as there is no simpler way to get the number of modifed rows. See https://issues.apache.org/jira/browse/IMPALA-12647 for alternatives.

This change adds option close_finished_queries for cursors with default True. Setting it to False brings back the old behavior.

If queries are closed after finishing queries, calling get_log RPC is no longer possible. If close_finished_queries is true then the logs are fetched and stored before closing to query to be able to return the saved results with get_log. Generally get_log shouldn't be a too expensive RPC.

Another potential side-effect is that get_profile may fail as Impala can discard the runtime profile after the query is closed (see Impala flag query_log_size).

Despite the above side effects closing the queries seems a better default behavior as it helps avoiding queries hanging in the "waiting to be closed" state and provides reliable rowcount. This is also consistent with the way impala-shell works.

Testing:

joemcdonnell commented 10 months ago

I'm not sure what our process is, should we file an issue on github for this?

csringhofer commented 10 months ago

I'm not sure what our process is, should we file an issue on github for this?

There is already an issue about rowcount: https://github.com/cloudera/impyla/issues/302 I am also unsure about the process.