databricks / databricks-sql-python

Databricks SQL Connector for Python
Apache License 2.0
150 stars 87 forks source link

Connector reads 0 rows although Cluster returned results #383

Open pimonteiro opened 5 months ago

pimonteiro commented 5 months ago

As the title says, I'm having an issue where the SQL connector, upon executing .fetchall() returns 0 rows, while the Cluster that ran the query returns approx. 151,653 rows - retrieved from cached in 500ms.

    with adb_sql.connect(
        server_hostname=db_creds_dict["db_name"],
        http_path=db_creds_dict["db_path"],
        access_token=db_creds_dict["db_token"],
    ) as conn:
        with conn.cursor() as cursor:
            if parameters is None:
                cursor.execute(query)
            else:
                cursor.execute(query.format(*parameters))
            data = pd.DataFrame(
                cursor.fetchall(),
                columns=[desc[0] for desc in cursor.description],
            )

In other situations where the same query returns fewer rows (i.e. 10s to return 28k rows), the connection reads correctly.

Python: 3.8.12 Package Version: 3.1.1 Databricks SQL Warehouse configuration:

benc-db commented 5 months ago

@andrefurlan-db thoughts?

kravets-levko commented 5 months ago

@pimonteiro

pimonteiro commented 5 months ago
kravets-levko commented 5 months ago

Thank you @pimonteiro! Yes, CloudFetch should improve handling of very large results. I asked you to disable it to narrow down the scope of the issue. If you're able to get your data with use_cloud_fetch = false (+ considering that with smaller results everything is fine) - then probably there's some bug in CloudFetch-related code. I need to poke around, and when I'll have other questions - I'll get back to you.

P.S. I see you provided library version and warehouse details - which is very nice, thank you! Can you please also tell us if you're using AWS or Azure workspace. Thanks!

pimonteiro commented 5 months ago

Thank you @pimonteiro! Yes, CloudFetch should improve handling of very large results. I asked you to disable it to narrow down the scope of the issue. If you're able to get your data with use_cloud_fetch = false (+ considering that with smaller results everything is fine) - then probably there's some bug in CloudFetch-related code. I need to poke around, and when I'll have other questions - I'll get back to you.

P.S. I see you provided library version and warehouse details - which is very nice, thank you! Can you please also tell us if you're using AWS or Azure workspace. Thanks!

Sounds good, we will in the meantime try and reduce the query results size, optimize it in order to continue development. I assume there's nothing wrong in, for now, disabling use_cloud_fetch?

And lastly, I'm using Azure Workspace. That should have been on the opening line of the ticket, I do apologize :)

kravets-levko commented 5 months ago

Yes, you can just disable CloudFetch and go on. You'll still be able to get large results, just maybe less efficient than with CloudFetch enabled, that's it

pimonteiro commented 4 months ago

Any update on this?

akshay-s-ciq commented 4 months ago

Hi, we're facing the same issue. Any update on this?

kravets-levko commented 3 months ago

@pimonteiro @akshay-s-ciq sorry, no much updates for now. I'm still trying to figure this out. However, recently we've got a very similar bug report but for Go driver. But we're still not sure what's going on.

Also, considering all the mentioned above, may I ask you to run same query but using a Nodejs connector? If any of you volunter to help - I can prepare a test project for you

kravets-levko commented 2 months ago

@pimonteiro @akshay-s-ciq We have few hypotheses about what may cause your issues. Currently we're checking them, but you may help us. We just released a v3.2.0, which, among the other changes, has more debug logging added. You can help us by using the v3.2.0, enabling debug logging (add import logging + logging.basicConfig(level=logging.DEBUG)), and when you see your issue again - collect and send us the log output. Thank you so much!

P.S. Don't forget to re-enable CloudFetch - use_cloud_fetch=True

rnakshay commented 2 months ago

I was also facing the same issue and use_cloud_fetch=False helped.

if it helps, here is what I get when I show debug logging

DEBUG - 2024-06-27 15:08:46,637- databricks.sql.thrift_backend - attempt_request:396 - Sending request: ExecuteStatement(<REDACTED>)
DEBUG - 2024-06-27 15:08:52,019- databricks.sql.thrift_backend - attempt_request:408 - Received response: TExecuteStatementResp(<REDACTED>)
DEBUG - 2024-06-27 15:08:52,020- databricks.sql.thrift_backend - attempt_request:396 - Sending request: GetOperationStatus(<REDACTED>)
DEBUG - 2024-06-27 15:08:57,122- databricks.sql.thrift_backend - attempt_request:408 - Received response: TGetOperationStatusResp(<REDACTED>)
DEBUG - 2024-06-27 15:08:57,123- databricks.sql.thrift_backend - attempt_request:396 - Sending request: GetOperationStatus(<REDACTED>)
DEBUG - 2024-06-27 15:09:02,258- databricks.sql.thrift_backend - attempt_request:408 - Received response: TGetOperationStatusResp(<REDACTED>)
DEBUG - 2024-06-27 15:09:02,260- databricks.sql.thrift_backend - attempt_request:396 - Sending request: GetOperationStatus(<REDACTED>)
DEBUG - 2024-06-27 15:09:05,450- databricks.sql.thrift_backend - attempt_request:408 - Received response: TGetOperationStatusResp(<REDACTED>)
DEBUG - 2024-06-27 15:09:05,451- databricks.sql.thrift_backend - attempt_request:396 - Sending request: GetResultSetMetadata(<REDACTED>)
DEBUG - 2024-06-27 15:09:05,584- databricks.sql.thrift_backend - attempt_request:408 - Received response: TGetResultSetMetadataResp(<REDACTED>)
DEBUG - 2024-06-27 15:09:05,585- databricks.sql.thrift_backend - attempt_request:396 - Sending request: FetchResults(<REDACTED>)
DEBUG - 2024-06-27 15:09:05,735- databricks.sql.thrift_backend - attempt_request:408 - Received response: TFetchResultsResp(<REDACTED>)
DEBUG - 2024-06-27 15:09:05,736- databricks.sql.utils - __init__:159 - Initialize CloudFetch loader, row set start offset: 0, file list:
DEBUG - 2024-06-27 15:09:05,736- databricks.sql.utils - __init__:166 - - start row offset: 0, row count: 53424
DEBUG - 2024-06-27 15:09:05,736- databricks.sql.utils - __init__:166 - - start row offset: 53424, row count: 8865
DEBUG - 2024-06-27 15:09:05,737- databricks.sql.cloudfetch.download_manager - add_file_links:52 - ResultFileDownloadManager.add_file_links: start offset 0, row count: 53424
DEBUG - 2024-06-27 15:09:05,737- databricks.sql.cloudfetch.download_manager - add_file_links:52 - ResultFileDownloadManager.add_file_links: start offset 53424, row count: 8865
DEBUG - 2024-06-27 15:09:05,737- databricks.sql.utils - _create_next_table:234 - CloudFetchQueue: Trying to get downloaded file for row 0
DEBUG - 2024-06-27 15:09:05,737- databricks.sql.cloudfetch.download_manager - _remove_past_handlers:123 - ResultFileDownloadManager: removing past handlers, current offset: 0
DEBUG - 2024-06-27 15:09:05,737- databricks.sql.cloudfetch.download_manager - _remove_past_handlers:132 - - checking result link: start 0, row count: 53424, current offset: 0
DEBUG - 2024-06-27 15:09:05,738- databricks.sql.cloudfetch.download_manager - _remove_past_handlers:132 - - checking result link: start 53424, row count: 8865, current offset: 0
DEBUG - 2024-06-27 15:09:05,738- databricks.sql.cloudfetch.download_manager - _schedule_downloads:144 - ResultFileDownloadManager: schedule downloads
DEBUG - 2024-06-27 15:09:05,738- databricks.sql.cloudfetch.download_manager - _schedule_downloads:149 - - start: 0, row count: 53424
DEBUG - 2024-06-27 15:09:05,740- databricks.sql.cloudfetch.download_manager - _schedule_downloads:149 - - start: 53424, row count: 8865
DEBUG - 2024-06-27 15:09:05,740- databricks.sql.cloudfetch.download_manager - _find_next_file_index:161 - ResultFileDownloadManager: trying to find file for row 0
DEBUG - 2024-06-27 15:09:05,740- databricks.sql.cloudfetch.download_manager - _find_next_file_index:177 - - found file: start 0, row count 53424
.
.
.
/python3.11/site-packages/databricks/sql/thrift_backend.py", line 408, in attempt_request
    logger.debug(
Message: 'Received response: TCloseSessionResp(<REDACTED>)'
Arguments: ()
DEBUG - 2024-06-27 15:10:05,748 - databricks.sql.cloudfetch.downloader - is_file_download_successful:62 - Cloud fetch download timed out after 60 seconds for link representing rows 0 to 53424
DEBUG - 2024-06-27 15:10:05,749 - databricks.sql.cloudfetch.download_manager - get_next_downloaded_file:112 - ResultFileDownloadManager: cannot find file for row index 0
DEBUG - 2024-06-27 15:10:05,751 - databricks.sql.utils - _create_next_table:244 - CloudFetchQueue: Cannot find downloaded file for row 0
DEBUG - 2024-06-27 15:10:05,771 - databricks.sql.thrift_backend - attempt_request:396 - Sending request: CloseOperation(<REDACTED>)
DEBUG - 2024-06-27 15:10:05,967 - databricks.sql.thrift_backend - attempt_request:408 - Received response: TCloseOperationResp(<REDACTED>)

Python 3.11.9, databricks-sql-connector-3.2.0

kthejoker commented 2 months ago

Do you have a VPN or Proxy on the machine running the connector? It most likely cannot complete the SSL handshake between your machine and the DBFS URIs provided by CloudFetch. Do you have SSL / Proxy traffic logs?

On Thu, Jun 27, 2024 at 8:56 AM Akshay R N @.***> wrote:

I was also facing the same issue and use_cloud_fetch=False helped.

if it helps, here is what I get when I show debug logging (we forma

/python3.11/site-packages/databricks/sql/thrift_backend.py", line 408, in attempt_request logger.debug( Message: 'Received response: TCloseSessionResp()' Arguments: () DEBUG - 2024-06-27 15:10:05,748 - databricks.sql.cloudfetch.downloader - is_file_download_successful:62 - Cloud fetch download timed out after 60 seconds for link representing rows 0 to 53424 DEBUG - 2024-06-27 15:10:05,749 - databricks.sql.cloudfetch.download_manager - get_next_downloaded_file:112 - ResultFileDownloadManager: cannot find file for row index 0 DEBUG - 2024-06-27 15:10:05,751 - databricks.sql.utils - _create_next_table:244 - CloudFetchQueue: Cannot find downloaded file for row 0 DEBUG - 2024-06-27 15:10:05,771 - databricks.sql.thrift_backend - attempt_request:396 - Sending request: CloseOperation() DEBUG - 2024-06-27 15:10:05,967 - databricks.sql.thrift_backend - attempt_request:408 - Received response: TCloseOperationResp()

— Reply to this email directly, view it on GitHub https://github.com/databricks/databricks-sql-python/issues/383#issuecomment-2194755271, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEY2HKHIEKBQSWRKIA4FO3ZJQKZ5AVCNFSM6AAAAABFRUQL26VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJUG42TKMRXGE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

rnakshay commented 2 months ago

Hmm, we do use VPN but I am not sure about the details. In my case, the above issue happens only when I am in Python debug mode. When run end to end this issue doesn't occur.

kthejoker commented 2 months ago

Interesting, is there any kind of extensive lag in debug between execution and fetch? Maybe if the connection is returned to the pool you can no longer retrieve session details like row counts even though you can still retrieve the data.

On Fri, Jun 28, 2024 at 3:21 AM Akshay R N @.***> wrote:

Hmm, we do use VPN but I am not sure about the details. In my case, the above issue happens only when I am in Python debug mode. When run end to end this issue doesn't occur.

— Reply to this email directly, view it on GitHub https://github.com/databricks/databricks-sql-python/issues/383#issuecomment-2196389967, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEY2HIWPMJK4TAFDN5BUETZJUMHJAVCNFSM6AAAAABFRUQL26VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJWGM4DSOJWG4 . You are receiving this because you commented.Message ID: @.***>

rnakshay commented 2 months ago

hmm, I don't think so. We use pandas read_sql function with databricks_sql.connect for connection. For now, this issue is noticed only in local debug runs in my case, I will use the use_cloud_fetch = false option and keep an eye out for any update/response from others. Thank you for your help.

kravets-levko commented 2 months ago

Hi guys 👋 First - thank you all for your reports, logs, and any other information you provided on this issue. We also collected other reports from Databricks customers, and this helped to identify several problems in CloudFetch code. Now we do a first attempt to fix what we've found so far - databricks/databricks-sql-python#405. Hopefully, this will fix all the major issues we know about so far. I will let you know when we release this fix. Thank you for patience!

rth commented 1 month ago

Now we do a first attempt to fix what we've found so far - https://github.com/databricks/databricks-sql-python/pull/405

~I can confirm that this replaces the 0 row response by a MaxRetryError for me (see https://github.com/databricks/databricks-sql-python/issues/413) which is better but maybe not optimal.~

Edit: never mind, this fixes it. The linked issue was for a specific case of combining large queries with SSL verification that had to be disabled.

kravets-levko commented 1 month ago

Hello here 👋 We just released v3.3.0 which includes a refactoring of CloudFetch-related code. We think it should fix your issues, so please give it a try and let me know if it helped or not (remember to enable CloudFetch via use_cloud_fetch=True). If you still see any issues - please enable debug logging (see https://github.com/databricks/databricks-sql-python/issues/383#issuecomment-2167627778) and share log output. Thank you!