Open alextk87 opened 1 year ago
Which version of PyHive and SQLAlchemy are you using?
Sorry for my late reply, I was on a vacation. The version of PyHive is 0.6.2 and the version of SQLAlchemy is 2.0.19. The issue is reproducible with result sets larger than 150000 records.
Can you try latest version of PyHive i.e. 0.7.0 to check whether issue still exists.
I'm using Presto Cluster for processing large amount of data.
To visualize the data I use the connector provided and suggested by the official Superset documentation, which is PyHive from the SQLAlchemy library and I'm using the default settings for the connection.
When using the provided pyhive presto connector and executing a very simple query - "SELECT * FROM test_table", the returned number of rows by the resultset is incorrect compared with the same query executed in the presto-cli app, the official connector provided by the Presto documentation.
I created two simple python scripts to test Presto connection using PyHive and the official jdbc.jar driver.
The PyHive connector returned wrong number of rows in the resultset about 817000 rows, exactly the same number of rows that was returned by the Superset chart. The connector with the official jdbc driver returned the correct amount of data - 875000 rows.
It looks like the issue is caused by the PyHive connector. Is it possible to change the connection method from PyHive to the official JDBC driver?
I'm attaching the two python scripts that I used to reproduce the issue.