databricks / databricks-sql-nodejs

Databricks SQL Connector for Node.js
Apache License 2.0
24 stars 34 forks source link

detecting databricks cluster is up #265

Open mx2323 opened 1 month ago

mx2323 commented 1 month ago

when connecting to a databricks warehouse, if it is not serverless, it can take minutes to start up.

we would like a nonblocking way of detecting whether a databricks warehouse is up and ready to go. is there a best practice on doing this in a nonblocking manner with this sdk?

the API currently blocks statements and waits for the cluster to be ready... we could probably do something like a SELECT 1 and wait 1 second for it to complete (and cancel the operation if it doensn't succeed in 1 second), but was just curious if there is a better way of going about whether the cluster is live and ready to go.

kravets-levko commented 1 month ago

Hi @mx2323! Unfortunately, there's no way to check cluster state using this library. Please also note that if you run query and then cancel it - cluster will continue to startup. I need to check if there's anything that can help with your case. Will get back to you soon

mx2323 commented 1 month ago

thanks @kravets-levko for responding. we are OK if the cluster will continue to startup since we are waiting for it to startup...

agree though, some kind of a definitive check or guidelines on how to check for readiness would be great!

kravets-levko commented 1 month ago

@mx2323 there is a REST API endpoint you can use fro this purpose: https://docs.databricks.com/api/workspace/clusters/get I have no prior experience with it, so you'd have to figure things out yourself. But if you'll struggle with it - feel free to ask, I'll do my best to help you

kravets-levko commented 1 month ago

Actually, it turned out super simple:

const host = '....';
const clusterId = '....';
const token = 'dapi....';

const params = new URLSearchParams({
  cluster_id: clusterId,
});

const response = await fetch(`https://${host}/api/2.1/clusters/get?${params}`, {
  method: 'GET',
  headers: {
    Authorization: `Bearer ${token}`,
  },
});

const data = await response.json();

console.dir(data.state);
kravets-levko commented 1 month ago

For SQL warehouse everything is the same, just use different API endpoint - https://docs.databricks.com/api/workspace/warehouses/get

kravets-levko commented 1 month ago

TBH - I don't know what people usually do. As for SQL Warehouses, the startup is usually quite fast (rarely more than a minute on all instances I use for testing, often even 20-30s). Compute clusters indeed take a significant amount of time to start, but that's expected. And, of course, for both warehouses and clusters you can disable auto-stop, and they will remain running until manually stopped

mx2323 commented 1 month ago

are there any issues with submitting a query of SELECT 1, and setting a timer for it to complete in 5 seconds, and cancelling if it doesnt complete in that time? that would be easier for us since it wouldnt require us to redo auth through the REST api.... and in the positive case, it'd just return quickly and in the negative case it'd wait 5 seconds which would be OK since we are just waiting.

kravets-levko commented 1 month ago

@mx2323 you can do it if it works for you. You can also use a queryTimeout option instead of timer. Just keep in mind that this options doesn't work with SQL Warehouses, only with clusters. See https://github.com/databricks/databricks-sql-nodejs/issues/167#issuecomment-2067162824 and https://github.com/databricks/databricks-sql-nodejs/issues/167#issuecomment-2069493883