Closed antonycourtney closed 2 months ago
Thanks for the report.
There are two separate problems here:
select to_days(5) as days;
select interval '1' day as days;
This is the main one, somehow related to arrow types, I am trying to find where it goes wrong and fix this bug.
Note to self: SELECT 1::INTERVAL
also fails in a funny way, without showing any result.
SELECT (CURRENT_TIMESTAMP + INTERVAL '1' DAY) - CURRENT_TIMESTAMP as result;
This is funny enough a "work as intended", as ICU is not included in the duckdb-wasm bundle.
Building duckdb CLI with just make
without explicitly enabling any extension and running the query shows the same behaviour.
This is more deep, and the proper approach either bundling ICU or moving to experimental extension loading.
I will update when I have a fix for the bug.
Thanks for looking into this and the update @carlopi! Will monitor this issue for updates.
Are there any updates on this?
I also ran into issues working with INTERVAL
data in DuckDB-WASM.
Could it be that the mapping from INTERVAL
DuckDB type to INTERVAL
type of Arrow tables does not work properly?
When I load an .arrow
file (created from a Parquet file) into an Arrow table in JS, for each INTERVAL
data entry I get a Uint8Array(12)
, a representation of month, days, and milliseconds.
import { tableFromIPC } from 'apache-arrow';
const table = await tableFromIPC(fetch('interval.arrow'));
const values = table.getChild('interval').get(0);
I can convert this array into a human readable format via
getIntervalData(values);
with
function getIntervalData(values: Uint8Array) {
const intervalMonths = base256ToDecimal(values.slice(0, 4));
const intervalDays = base256ToDecimal(values.slice(4, 8));
const intervalMilliSeconds = base256ToDecimal(values.slice(8, 12));
const seconds = Math.floor(intervalMilliSeconds / 1000);
const minutes = Math.floor(seconds / 60);
const hours = Math.floor(minutes / 60);
const days = Math.floor(hours / 24);
const remainingMilliseconds = intervalMilliSeconds % 1000;
const remainingSeconds = seconds % 60;
const remainingMinutes = minutes % 60;
const remainingHours = hours % 24;
return {
years: Math.floor(intervalMonths / 12),
months: intervalMonths % 12,
days: intervalDays + days,
hours: remainingHours,
minutes: remainingMinutes,
seconds: remainingSeconds,
milliseconds: remainingMilliseconds
};
}
function base256ToDecimal(values: Uint8Array) {
return values.reduce((agg: number, value: number, idx: number) => {
return agg + value * Math.pow(256, idx);
}, 0)
}
When I load the corresponding Parquet file with same data into DuckDB-WASM, data seems to be messed up. For each data entry I receive a Uint32Array(2)
(via getChild('interval').get
), which only holds 1/4 of information (months; rest is in subsequent three parts).
In other words, vector length is four times longer than number of records.
It is also not clear, how to get milliseconds from third and fourth part.
I would have expected a similar structure like in Arrow table above.
Same Parquet file works flawless in DuckDB CLI.
Following-up on this issue: it seems that icu
is not being auto-loaded (as other extensions such as json
are)
Here with the latest shell:
DuckDB Web Shell
Database: v0.10.0
Package: @duckdb/duckdb-wasm@1.28.1-dev153.0
Connected to a local transient in-memory database.
Enter .help for usage hints.
duckdb> select cast(to_timestamp(100000) at time zone 'PST' as date) as event_date
...> ;
Binder Error: No function matches the given name and argument types 'timezone(STRING_LITERAL, TIMESTAMP WITH TIME ZONE)'. You might need to add explicit type casts.
Candidate functions:
timezone(DATE) -> BIGINT
timezone(TIMESTAMP) -> BIGINT
timezone(INTERVAL) -> BIGINT
timezone(INTERVAL, TIME WITH TIME ZONE) -> TIME WITH TIME ZONE
LINE 1: select cast(to_timestamp(100000) at time zone 'PST' as date) as event_da...
^
duckdb> load icu;
┌┐
└┘
duckdb> select cast(to_timestamp(100000) at time zone 'PST' as date) as event_date;
┌────────────┐
│ event_date │
╞════════════╡
│ 1970-01-01 │
└────────────┘
Is that expected?
Hi @Y--, as you already noticed in the meantime, ICU is not tagged as autoloadable by duckdb (yet). I will look into sending a PR for that, preparatory work was already done in https://github.com/duckdb/duckdb/pull/9065, see in particular @Mytherin comment on what was missing back then.
Given ICU comes as loaded by default on most cases, I can see a reason to backport that PR also to duckdb-wasm on version v0.10.0.
@carlopi Is there an update on this? I think I'm running into an error that is related to this issue. I'm using duckdb-wasm@1.28.1-dev221.0
.
await connection.query(`
BEGIN TRANSACTION;
ALTER TABLE testData ADD COLUMN TripCreatedTz TIMESTAMPTZ;
UPDATE testData SET TripCreatedTz = CAST(TripCreated AS TIMESTAMP) AT TIME ZONE 'UTC';
COMMIT TRANSACTION;
`);
In chrome dev tools console:
Error: Binder Error: No function matches the given name and argument types 'timezone(STRING_LITERAL, TIMESTAMP)'. You might need to add explicit type casts.
Candidate functions:
timezone(DATE) -> BIGINT
timezone(TIMESTAMP) -> BIGINT
timezone(INTERVAL) -> BIGINT
timezone(INTERVAL, TIME WITH TIME ZONE) -> TIME WITH TIME ZONE
LINE 5: ... TABLE testData ADD COLUMN TripCreatedTz TIMESTAMPTZ;
UPDATE testData SET TripCreatedTz = CAST(TripCreated AS TIMESTAMP) AT TIME ZONE 'UTC';
^
at fa.runQuery (bindings_base.ts:168:19)
at Ao.onMessage (worker_dispatcher.ts:202:51)
at Hc.globalThis.onmessage (duckdb-browser-eh.worker.ts:29:19)
I'm new to DuckDB so either I'm doing something wrong or there's some issue with the time zone extension that DuckDB-WASM is using internally?
There are a few independent problems here in the thread, trying to separate them:
@antonycourtney: intervals computations should now make more sense: duckdb> SELECT '2 minutes'::INTERVAL; ┌───────────────────────────────────────────────────────┐ │ CAST('2 minutes' AS INTERVAL) │ ╞═══════════════════════════════════════════════════════╡ │ 0 years 0 mons 0 days 0 hours 2 mins 0.000000000 secs │ └───────────────────────────────────────────────────────┘ Output is not 1-1 with DuckDB CLI, due to the fact that the console is handled Rust side with a different logic. Important part is that result does make sense. Problem was a weird problem with interpreting the Arrow protocol at the Arrow-Rust level. It has recently been solved in arrow-rust version 52, released last week, and now duckdb-wasm reaps the benefit of having this properly sorted out.
@Y--: There is yet another autoloading problem connected with ICU. I will need to open a bug to duckdb/duckdb on this. Problem is that even though ICU is autoloadable, in this particular case the autoloading mechanism is not working properly. Workaround, a bit heavy handed, is loading ICU explicitly. Problem is like: https://shell.duckdb.org/#queries=v0,select-cast(to_timestamp(100000)-at-time-zone-'PST'-as-date)-as-event_date~,LOAD-icu~,select-cast(to_timestamp(100000)-at-time-zone-'PST'-as-date)-as-event_date~
@matys1: I believe your problem to be similar to the previous one, but hard to be sure since there is not a complete reproduction. Could you please open another bug with a complete (= including table creation or so) repro? Thanks
I will close this for now, due to the main issue being solved. ICU-related problems can likely go to a separate one.
Thanks everyone.
@carlopi actually I just figured that I can fix the issue by explicitly loading the icu extension by adding load icu;
in the query transaction like below and it works.
await connection.query(`
BEGIN TRANSACTION;
load icu;
ALTER TABLE testData ADD COLUMN TripCreatedTz TIMESTAMPTZ;
UPDATE testData SET TripCreatedTz = CAST(TripCreated AS TIMESTAMP) AT TIME ZONE 'UTC';
COMMIT TRANSACTION;
`);
I'm not sure if it's worth opening a new bug report since it seems you're already aware of the auto loading issue from other comments. Let me know though I have an example of this in Vite just in case you need.
@matys1: I moved the issue here: https://github.com/duckdb/duckdb-wasm/issues/1770, this needs solving at the DuckDB level, but given the workaround is relevant is useful also to others I think it's more useful here.
And indeed your problem is the same as the one above, but added to the repro just in case to be able to double check.
What happens?
In a number of simple queries involving INTERVAL values, DuckDb WASM produces what appears to be a wrong result; it is at least inconsistent with the DuckDb CLI. Several basic examples:
Here are a few queries I tried on shell.duckdb.org and their results:
and here are the same queries executed with the DuckDb 0.8.0 CLI:
To Reproduce
Try any of the simple queries from above:
Browser/Environment:
Chrome Version 113.0.5672.92 (Official Build) (arm64)
Device:
M2 MacBook Pro
DuckDB-Wasm Version:
1.25.1-dev13.0
DuckDB-Wasm Deployment:
shell.duckdb.org
Full Name:
Antony Courtney
Affiliation:
MotherDuck