databricks / databricks-sql-go

Golang database/sql driver for Databricks SQL.
Apache License 2.0
34 stars 37 forks source link

Critical: "context deadline exceeded" after upgrade from 1.4.0 to 1.5.3 #200

Open gilsegment opened 3 months ago

gilsegment commented 3 months ago

We started getting "context deadline exceeded" after upgrade from 1.4.0 to 1.5.3 This happens during inital connection to the warehouse. We are getting that error on multiple Databricks warehouses from different accounts.

I suspect we are getting timeout here: https://github.com/databricks/databricks-sql-go/blob/164893503c207fa6fc26e99666d54a6ebcb67d29/connection.go#L82

which uses hard coded timeout of 60s without option to modify it: https://github.com/databricks/databricks-sql-go/blob/beea4c4d35ce778a9e916ac03d463c59d422a5fb/internal/config/config.go#L191

Maybe you should use the "timeout" provided in DSN also for the ping https://github.com/databricks/databricks-sql-go/blob/beea4c4d35ce778a9e916ac03d463c59d422a5fb/internal/config/config.go#L242

kravets-levko commented 2 months ago

@gilsegment Can you please help us to narrow down the scope of the issue? 1.5.3 doesn't introduce much changes, so can you please try to gradually upgrade from 1.4.0 and check which version contains the issue? That would help us a lot. Thank you!

gilsegment commented 2 months ago

Unfortunately I cannot do that. Few things I can suggest are:

  1. See which changes are relevant in the release notes from 1.4.0 to 1.5.3
  2. Respect the timeout input parameter like I recommended in my initial comment. Or introduce a new parameter just for the ping.
  3. Check if during the time I reported this, there was a backend issue with Databricks that might have caused that. I think this it less likely because we seen this happening right after the client library was upgraded. But still possible. (first query after connection takes long time -> "select 1" in our case)