feat: support named parameters and bulk inserts

cocoa-xu commented 4 months ago

Database like Google BigQuery also supports named parameters in SQL queries:

To specify a named parameter, use the @ character followed by an identifier, such as @param_name. Alternatively, use the placeholder value ? to specify a positional parameter. Note that a query can use positional or named parameters but not both.

To support named parameters, we need to change elixir_to_arrow_type_struct in adbc_nif.cpp and allow users to pass a map/keyword list:

positional parameters: This is the first case and basically what we're supporting now, which is a list where each element's type can be one of number(), binary(), boolean() and nil.

And users cannot specify the type of their parameters.

Therefore, one possible way is to use a 2-tuple, {type, value}, where the first element specifies the type and the second one is the corresponding value. And we can infer their types if not specified. For now, we can prioritise the support for most basic types.

The possible value for type can be one of

:i8
:i16
:i32
:i64
:u8
:u16
:u32
:u64
:f16
:f32
:f64
:string
:binary
:boolean
nil

We can add support for complex types like list and struct later.

[
123,                    # int64,  inferred
456,                    # int64,  inferred
{:u64, 789},            # uint64
"string",               # bytes,  inferred
{:string, "string"},    # string
4.56,                   # double, inferred
{:f32, 7.89},           # float
true,                   # bool,   inferred
false,                  # bool,   inferred
nil,                    # na,     inferred
<<1, 2, 3>>             # bytes,  inferred
]

the above example corresponds to a 1-row-11-column arrow record. For databases that support bulk inserts, we can pass a list for each column:

[
[123, 456, nil]                      # int64,  inferred
[456, 789, 123]                      # int64,  inferred
{:u64, [789, nil, 123]},             # uint64
["string", "as", "bytes"],           # bytes,  inferred
{:string, ["str1", "str2", "str3"]}, # string
[4.56, 5.0, 6.0],                    # double, inferred
{:f32, [7.89, 8.0, 9.0]},            # float
[true, true, true],                  # bool,   inferred
[false, false, false],               # bool,   inferred
[nil, nil, nil],                     # na,     inferred
[<<1, 2, 3>>, <<4>>, <<5>>]          # bytes,  inferred
]

the above example corresponds to a 3-row-11-column arrow record.

Of course, each column should have the same length and these values in a column should have the same types (except for nil if the column is nullable)

named parameters: This is the second case where it should be a map or keyword list, similar to the first case, but the keys are column names, e.g.,
```
%{
a: 123,
b: 456,
c: "string",
d: {:f32, 4.56}
}
```
the above example corresponds to a 1-row-4-column arrow record.

As for bulk inserts, we'd have
```
%{
a: [123, 456, nil],
b: [456, 789, 42],
c: ["string", "as", "bytes"],
d: {:f32, [4.56, 5.0, nil]}
}
```
the above example corresponds to a 3-row-4-column arrow record.

josevalim commented 4 months ago

I wonder if we should introduce a proper buffer API instead. IIRC, for bulk inserts, each column is an arrow buffer. So maybe we should have: ADBC.Buffer.u8([0, 2, 3]) and so forth.

So the supported arguments would be:

[
  123,                    # int64,  inferred
  "string",               # string,  inferred
  4.56,                   # double, inferred
  true,                   # bool,   inferred
  false,                  # bool,   inferred
  nil,                    # na,     inferred
  %ADBC.Buffer{}
]

Then, for named arguments, we support either maps or keyword lists. Then we can also provide query APIs that return buffers, this way we can easily pass the result of a query to another query. WDYT?

cocoa-xu commented 4 months ago

I wonder if we should introduce a proper buffer API instead. IIRC, for bulk inserts, each column is an arrow buffer. So maybe we should have: ADBC.Buffer.u8([0, 2, 3]) and so forth.

So the supported arguments would be:
[
  123,                    # int64,  inferred
  "string",               # string,  inferred
  4.56,                   # double, inferred
  true,                   # bool,   inferred
  false,                  # bool,   inferred
  nil,                    # na,     inferred
  %ADBC.Buffer{}
]
Then, for named arguments, we support either maps or keyword lists. Then we can also provide query APIs that return buffers, this way we can easily pass the result of a query to another query. WDYT?

Ahh, ADBC.Buffer sounds definitely better! I'll try to implement it and send a PR :)

elixir-explorer / adbc

feat: support named parameters and bulk inserts #66