duckdb / duckdb-wasm

WebAssembly version of DuckDB
https://shell.duckdb.org
MIT License
1.02k stars 110 forks source link

Connection.insertArrowTable fails for esbuild typescript #1708

Open rob-blackbourn opened 2 months ago

rob-blackbourn commented 2 months ago

What happens?

The method Connection.insertArrowTable silently fails in a typescript program.

To Reproduce

bare-node works

Change the query in examples/bare-node/index.cjs from:

        const conn = await db.connect();

        await conn.query(`SELECT count(*)::INTEGER as v FROM generate_series(0, 100) t(v)`);

        await conn.close();

to:

        const conn = await db.connect();

        const json = [
            { a: 1, b: 11 },
            { a: 2, b: 22 },
        ];

        const table = arrow.tableFromJSON(json);
        await conn.insertArrowTable(table, {
            name: 'local_table',
            create: true,
        });
        try {
            const result = await conn.query(`SELECT * FROM local_table`);
            console.log(result.toString());
        } catch (error) {
            console.log(error);
        }

        await conn.close();

The table should be created and the query succeeds.

esbuild-node fails

Change the query in examples/esbuild-node/index.ts from:

        const conn = await db.connect();

        await conn.query<{ v: arrow.Int }>(`SELECT count(*)::INTEGER as v FROM generate_series(0, 100) t(v)`);

        await conn.close();

to:

        const conn = await db.connect();

        const json = [
            { a: 1, b: 11 },
            { a: 2, b: 22 },
        ];

        const table = arrow.tableFromJSON(json);
        await conn.insertArrowTable(table, {
            name: 'local_table',
            create: true,
        });
        try {
            const result = await conn.query<{ a: arrow.Int; b: arrow.Int }>(`SELECT * FROM local_table`);
            console.log(result.toString());
        } catch (error) {
            console.log(error);
        }

        await conn.close();

This should fail.

Investigation

When I step through the code I notice the failure is in apache-arrow/ipc/writer.js in the function writeAll at the statement if (input instanceof table_js_1.Table) {. In the bare-node example this is true while for esbuild-node this is false. Furthermore for bare-node input.__proto__ === table_js_1.Table.prototype is true, but for esbuild-node it is false.

It appears the internal version of the apache-arrow Table class is different from that of the locally imported one.

I note the bundle.mjs in duckdb-wasm patches the appache-arrow/package.json due to the export strategy of the appache-arrow package. Could this be involved?

Browser/Environment:

Chrome 123.0.6312.123

Device:

MacBook Pro (ARM)

DuckDB-Wasm Version:

Latest clone (also 1.28.1-dev179.0)

DuckDB-Wasm Deployment:

The duckdb-wasm repo

Full Name:

Rob Blackbourn

Affiliation:

None

domoritz commented 2 months ago

Looks like you may have multiple versions of arrow present?

jonathanswenson commented 2 months ago

Potentially similar to what we were seeing over in here: https://github.com/duckdb/duckdb-wasm/discussions/1545#discussioncomment-8550373

from https://github.com/duckdb/duckdb-wasm/issues/1640#issuecomment-1958485695

I was playing with different versions of apache-arrow in my own package.json and found that matching exactly (14.0.1 for dev106 and 15.0.0 for dev132) would allow me to successfully insert data WITHOUT having to add the EOS buffer.

Do you have an apache-arrow version set? I'd expect that it still needs to be set to 15.0.0 to match this the pinned version in duckdb-wasm: https://github.com/duckdb/duckdb-wasm/blob/main/packages/duckdb-wasm/package.json#L26C6-L26C34

domoritz commented 2 months ago

I think we should try to make arrow more forgiving if there are multiple versions installed but it's never good to have multiple versions of the same library.

rob-blackbourn commented 2 months ago

@domoritz Yes

I should emphasise I am using a clone of the duckdb-wasm repo. I have built this repo, and I'm using the repos's example programs. There should be no confusion on what the environment is.

It seems that the version of arrow captured by duckdb-wasm at build time is not available to downstream code in the esbuild-node example program.

Most of the time this doesn't matter as they have identical functionality. It only seems to matter when things like instanceof are involved.

rob-blackbourn commented 2 months ago

@domoritz In this case it is the same version. I think the problem arises because arrow exports different flavours of JavaScript modules.

domoritz commented 2 months ago

Esm and cjs? Hmm, yeah, I can see that that's an issue.