duckdb / duckdb-wasm

WebAssembly version of DuckDB
https://shell.duckdb.org
MIT License
1.03k stars 113 forks source link

duckdb HUGEINT is represented using Decimal datatype and Uint32[] values in the resultset #1579

Closed rpbouman closed 5 months ago

rpbouman commented 6 months ago

What happens?

Querying a SUM on a Bigint value results in duckdb wasm returning the data using a Decimal field type and Uint32[] values, while the DESCRIBE query on the statement indicates the SUM (bigint) expression would result in a DuckDB HUGEINT. It was expected that the duckdb HUGEINT would be returned as javascript BigInteger

It was expected that

To Reproduce

https://static.data.gouv.fr/resources/populations-legales-communales-2017-2021/20231228-135143/poplegales2017-2021.parquet

  var sql = 'SELECT MIN( PMUN ),  SUM( PMUN ) FROM "poplegales2017-2021.parquet"';
  var row, describeResults = await connection.query(`DESCRIBE ${sql}`);

  row = describeResults.get(0);  
  // column type is BIGINT as expected
  console.log(row.column_type);
  row = describeResults.get(1);  
  // sum type is HUGEINT (which makes sense)
  console.log(row.column_type);
  var field, dataResults = await connection.query(sql);
  row = dataResults.get(0);

  field = dataResults.schema.fields[0];
  // field type is BigInteger, as expected
  console.log( field.type );
  // value type is bigint as expected
  console.log( row[field.name] );

  field = dataResults.schema.fields[1];
  // field type is Decimal. Expected: BigInteger
  console.log( field.type );
  // value type is Uint32Array. Expectd: javascript BigInteger
  console.log( row[field.name] );

Browser/Environment:

Version 120.0.6099.130 (Official Build) (64-bit)

Device:

Windows 11 Desktop

DuckDB-Wasm Version:

1.28.0

DuckDB-Wasm Deployment:

https://cdn.jsdelivr.net/npm/@duckdb/duckdb-wasm@1.28.0/+esm

Full Name:

Roland Bouman

Affiliation:

EPAM Systems BV Netherlands

rpbouman commented 6 months ago

Screenhot showing script output in the browser console. image

rpbouman commented 5 months ago

After reading up a little I believe I understand now what is happening.

Duckdb wasm uses arrow data types, and there does not seem to be a 128 bit wide Arrow integer flavor. So, it uses the next best thing which is a decimal with zero decimals, and unfortunately that means the actual value is represented as a Uint32 array because that's just what arrow does.

My initial idea that a javascript bigint was expected because that could easily handle the duckdb HUGEINT type is irrelevant as the arrow interface is leading here, not the fact that the value could have been conveniently represented by bigint.

As for processing the value, String(value) or value.toString() will return a string representation of the intended numerical value and this should then further be handled by the client.

My conclusion now is that this is not a duckdb wasm issue. I'm sorry for bothering.