Open maryhmhhu opened 6 months ago
It seems the order of columns (in the array of objects as an input to https://arrow.apache.org/docs/js/functions/Arrow_dom.tableFromJSON.html) matters; the values are added based on the position in the object, and not based on the names of the columns.
If I add the column values in the same order as the table definiton, then the output is as expected. E.g. if I change the first few lines to
const rowNumCol = 'key';
const idCol = 'id';
const someData = { [rowNumCol]: 0 };
const someData2 = { [rowNumCol]: 1 };
someData[idCol] = '3';
someData2[idCol] = '4';
Then the logs of input is
input 1 {key: 0, id: '3'}
input 2 {key: 1, id: '4'}
arrowTable 1 [
{"key": 0, "id": "3"}
]
arrowTable 2 [
{"key": 1, "id": "4"}
]
and the output is
row 1 {key: 0, id: 3}
row 2 {key: 1, id: 4}
It seems the order of columns (in the array of objects as an input to arrow.apache.org/docs/js/functions/Arrow_dom.tableFromJSON.html) matters; the values are added based on the position in the object, and not based on the names of the columns.
I don't follow. If this is an issue with Arrow, I can look into it.
const table = tableFromJSON([{
a: 1,
b: 2
}, {
b: 2,
a: 1,
}]);
console.log(table.toArray())
outputs
[ {"a": 1, "b": 2}, {"a": 1, "b": 2} ]
Hi @domoritz , the issue is only reproduced when the order is not the same in the first entry, but the issue doesn't manifest in the arrow table that's created, rather the reading of the duckdb table.
const table = arrow.tableFromJSON([
{
a: 1,
b: 2,
},
]);
console.log('table', table.toString());
outputs a correct arrow table
table [
{"a": 1, "b": 2}
]
but when reading from duckdb after inserting the table (note - the order of columns in the table created is b
then a
)
await c.query(`CREATE TABLE test(b INTEGER PRIMARY KEY, a INTEGER)`);
await c.insertArrowTable(table, {
create: false,
name: 'test',
});
for await (const batch of await c.send(`SELECT * FROM 'test'`)) {
for (const row of batch) {
const r = {};
for (const [field, val] of row) {
r[field] = val;
}
console.log(`row`, r);
}
}
the output is not correct
{b: 1, a: 2}
I suspect somewhere within insertArrowTable
the logic uses the insertion order of entries, instead of strictly looking at key value pairs, which would result in this swap.
Thanks for the explanation. So it's probably a bug in DuckDB wasm, not arrow.
What happens?
When using
insertArrowTable
, the values for columnsid
andkey
are swapped.For below code example, logs of input to
insertArrowTable
expected
actual
To Reproduce
With
c
being anAsyncDuckdbConnection
:Browser/Environment:
Chrome Version 120.0.6099.129 (Official Build) (arm64)
Device:
Macbook
DuckDB-Wasm Version:
1.28.0
DuckDB-Wasm Deployment:
local
Full Name:
Mary Hu
Affiliation:
PopSQL