databricks / databricks-sql-nodejs

Databricks SQL Connector for Node.js
Apache License 2.0
24 stars 32 forks source link

Batch INSERT (using array of arrays) results in the application hanging #261

Open jpmnteiro opened 2 months ago

jpmnteiro commented 2 months ago

Hey!

I'm trying to parameterise an INSERT statement and feeding executeStatement an array of arrays so that I can batch INSERTs into a table.

However, it seems like instead of working (or failing), the driver never responds and the application just hangs.

Example:

const query = `INSERT INTO test(id) values(?)`.
const params = [[1], [2], [3]];
const session = /* call some code to get a DBX session /* 
const op = session.executeStatement(query, { ordinalParameters: params })
...

If I fully create the INSERT query, INSERT INTO test(id) values (1), (2), (3), it works just fine.

I had a look at how executeStatement handles parameters and it doesn't seem to support arrays of arrays, so I am at loss of how to batch insert with parameterisation.

Env:

kravets-levko commented 2 months ago

Hi @jpmnteiro!

I tried your snippet, and indeed it fails when you pass an array of arrays. The error is a bit obscure, though (Error: writeString called without a string/Buffer argument: 1 - Thrift library cannot serialize the value), that's what we definitely have to improve.

I think your app seems to "hang" because you don't handle the promise returned from session.executeStatement() - you should either await it or add .then/.catch handlers.

For arrays as parameters: currently, only primitive values are supported as parameters. It's a limitation of the server, the library just follows it. But even when one day server will support complex type values - you still cannot do batch inserts like you wanted initially. Every placeholder matches a single value, even if the value is an array. Arrays will not get expanded as a multiple parameters, the whole array will be used as a value. For batch inserts you have to do exactly what you did as a workaround - add multiple placeholders and bind multiple values to them.

Lastly, I would highly recommend you to use a proper IDE. For example, IntelliJ IDEA can warn you about mishandling the result of session.executeStatement, as well as incorrect value passed ordinalParameters - even in JS file. Switching to TypeScript can improve you experience even more - as it can catch most of the accidental typos and error even before running your script.

jpmnteiro commented 2 months ago

I tried your snippet, and indeed it fails when you pass an array of arrays. The error is a bit obscure, though (Error: writeString called without a string/Buffer argument: 1 - Thrift library cannot serialize the value), that's what we definitely have to improve.

Cool, definitely worth having less obscure errors.

I think your app seems to "hang" because you don't handle the promise returned from session.executeStatement() - you should either await it or add .then/.catch handlers.

The code I posted is merely illustrative (I missed the await there), the actual app code awaits the call.

Every placeholder matches a single value, even if the value is an array. Arrays will not get expanded as a multiple parameters, the whole array will be used as a value. For batch inserts you have to do exactly what you did as a workaround - add multiple placeholders and bind multiple values to them.

That's quite unfortunate, considering other drivers out there do a sensible approach and turn the array of arrays into a multi-value insert. Maybe worth updating the documentation to state that this is not a supported case (and offer a workaround?).

Lastly, I would highly recommend you to use a proper IDE. For example, IntelliJ IDEA can warn you about mishandling the result of session.executeStatement,

I appreciate the suggestion, even if it does come across a little bit condescending. VsCode and eslint also catch the issue (which is not an issue as I explained above).

as well as incorrect value passed ordinalParameters - even in JS file. Switching to TypeScript can improve you experience even more - as it can catch most of the accidental typos and error even before running your script.

That's not quite right actually, considering that the TS type checker infers that your DBSQLParameterValue is an any, despite the type being defined as export type DBSQLParameterValue = undefined | null | boolean | number | bigint | Int64 | Date | string;

image


Thank you for taking the time to reply!

kravets-levko commented 2 months ago

Cool, definitely worth having less obscure errors.

I'll reopen this issue until we fix this error message, if you don't mind

The code I posted is merely illustrative (I missed the await there), the actual app code awaits the call

That's interesting. So you actually do await the promise in your code, but don't see neither error nor result? This doesn't seem right. I think I need your help to reproduce this issue - maybe more details on your code, or (better) - a minimal reproducible example I can run on my machine

That's not quite right actually, considering that the TS type checker infers that your DBSQLParameterValue is an any, despite the type being defined as export type DBSQLParameterValue = undefined | null | boolean | number | bigint | Int64 | Date | string;

That's actually interesting as well. I would like to understand why your editor cannot infer these types, and if we can improve anything on our side. Because IDEA does really a decent job for me:

image

And with your snippet I see something like this (in JS file):

image