evidence-dev / evidence

Business intelligence as code: build fast, interactive data visualizations in pure SQL and markdown
https://evidence.dev
MIT License
4.39k stars 210 forks source link

timezone information is dropped and everything is displayed in UTC for timezoned data #1652

Open fboerman opened 8 months ago

fboerman commented 8 months ago

Steps To Reproduce

  1. create a query which returns data with at least one column with timestamp with timezone data, make sure the data is in a non UTC timezone
  2. load it in a page
  3. put it in a datatable and see that data is in UTC format

in the same category if you use a timestamp in a where clause with timezone info no data is returned.

Environment

Expected Behavior

data is in the timezone that the source was in

Actual Behaviour

data is shown in UTC

Workarounds

to get both the visualization as well as the filtering working it is needed to load the icu extension and then do a double timezone conversion. steps to do this:

  1. add "@duckdb/duckdb-wasm": "1.28.1-dev148.0" to overrides in packages.json and install
  2. add at the top of your page the following snippit: <script> import {query} from "@evidence-dev/universal-sql/client-duckdb"; query(" LOAD icu; ") </script>
  3. adjust your filter to use for example time >= TIMESTAMPTZ '2024-01-01 00:00:00 Europe/Amsterdam'
  4. double convert your time column like this: "time" at time zone 'UTC' at time zone 'Europe/Amsterdam' as time

now the data should be correctly filtered and shown WARNING: this newer version of duckdb-wasm has not been fully tested with evidence! it seems to work but behold for edge cases.

Drawbacks to workaround:

when clicking download data from for example datatable or linechart the data in the csv is shown as being UTC (with the Z) time but that is wrong, its actually in the local timezone specified. this is actually wrong instead of different dispaly then you want so be aware

possible ideas for feature to fix this

as a feedback point it would be great if the timezone shown on a page would be configurable, both as a global default as well as a possible override per page or query

jaroet commented 7 months ago

I just wanted to add that I too am running into this issue. My case is specifically with a dateRange component to filter out data. That goes wrong. This makes it for me impossible to use Evidence as almost all our reports have or will have a date based filter.

I do not have any specific timestamptz type of fields in my table but regular timestamp. But as the timezone of the postgresql server I am using is Europe/Amsterdam I guess all timestamp fields are in fact timestamptz.

hilkeheremans commented 6 months ago

Just to add some info here, here's some more info about the workaround above:

The handy part about using a macro here is that it should be easier to 'remove' the fix in various places as soon as Evidence has a more permanent solution, simply by changing the function to a no-op

hilkeheremans commented 6 months ago

Just to add again, I found that Evidence refuses to make a production build (weird 500 error) if you attempt this hack. The root cause of this is build behavior vs dev behavior, where the timing of the query calls seems to be different (or that the dev server executes this once at load, but not again -- not sufficiently familiar with SvelteKit's internals at the moment).

The way to work around this for now is to force rendering of the component tree to wait until the extensions are actually loaded, by adjusting your +layout.svelte something like this.

Note that the workaround below can be applied to basically anything async that you want to wait for before any page load.

<script>
    import "@evidence-dev/tailwind/fonts.css";
    import "../app.css";
    import { EvidenceDefaultLayout } from "@evidence-dev/core-components";
    import { query } from "@evidence-dev/universal-sql/client-duckdb";
    async function installICU() {
        await query(" LOAD icu; ");
        await query(
            "CREATE FUNCTION tz_fix(x, b := 'Europe/Amsterdam') AS (x::TIMESTAMP at time zone 'UTC' at time zone b);",
        );
    }
    export let data;
</script>

{#await installICU()}
    <p>Loading</p>
{:then result}
    <EvidenceDefaultLayout {data}>
        <slot slot="content" />
    </EvidenceDefaultLayout>
{:catch error}
    <p style="color: red">{error.message}</p>
{/await}
hilkeheremans commented 6 months ago

Another addition: use

CREATE OR REPLACE FUNCTION

instead of just

CREATE FUNCTION

to maintain stability with HMR

hilkeheremans commented 6 months ago

Another addition:

Unfortunately, this fix does not work well. It breaks (at least) the following functionality on build (not in dev mode):

Use at your own risk!

archiewood commented 5 months ago

I have taken some time to chase this down.

Evidence has several stages of it's data pipeline, so I wanted to ensure we know what we are doing at each stage.

Stages

npm run sources

  1. Data in the native DB (eg @evidence-dev/postgres, @evidence-dev/duckdb), using native DB types
  2. eg @evidence-dev/postgres: Data gets read by the native NodeJS connector , and converted into JS types
  3. @evidence-dev/postgres: Data is mapped from JS types into a set of 4 evidence types (string, number, bool, datetime)
  4. @evidence-dev/universal-sql: Data is converted from an JS Array into an arrow vector, columnwise, and then into an arrow table
  5. @evidence-dev/universal-sql: Data is written from arrow table into parquet file on disk

npm run dev

  1. @evidence-dev/universal-sql: DuckDB wasm reads data from parquet file based on markdown query
  2. @evidence-dev/universal-sql: Data is saved in the in memory DuckDB WASM database
  3. User queries read from the WASM database into the query store
  4. Components read data from query store (back to JS array)
  5. Components format data and display it

Complications

Evidence Today

I have run an end to end test using DuckDB to confirm current behaviour.

  1. Evidence takes values as they come out of queries, and doesn't change them.
  2. However, some(!) components choose to render these values in UTC. Specifically the DataTable, and the QueryViewer
  3. If they don't have a timezone specified, it assumes they are UTC

Fixes

fboerman commented 5 months ago

Hi Archie,

Thanks for the write up. Perhaps good to mention that in order to get timezone support in duckdb you need to load ICU extention. If you do that you are already 90% there

-------- Original Message -------- On 08/06/2024 10:38, Archie Sarre Wood wrote:

I have taken some time to chase this down.

Evidence has several stages of it's data pipeline, so I wanted to ensure we know what we are doing at each stage.

Stages

npm run sources

  • Data in the native DB (eg Postgres, DuckDB), using native DB types
  • Data gets read by the native nodejs connector, and converted into JS types
  • Data is mapped from JS types into a set of 4 evidence types (string, number, bool, datetime)
  • Data is converted from an JS Array into an arrow vector, columnwise, and then into an arrow table
  • Data is written from arrow table into parquet file on disk

npm run dev

  1. DuckDB wasm reads data from parquet file based on markdown query
  2. Data is saved in Query Store
  3. Components read data from query store (back to JS array)
  4. Components format data and display it

Complications

  • JS does not have a concept of timezone aware timestamps. All timestamps are stored as milliseconds since 1970, in UTC.
  • Some "databases" do not have a native concept of time (eg SQLite, CSV) and use local time strings to represent time
  • Some developers choose (or use systems that choose) to store data in timestamps without timezones, but with an implicit timezone choice of their local TZ. If you change the system timezone it can changes the implied timestamp

Evidence Today

I have run an end to end test using DuckDB to confirm current behaviour.

  • Evidence takes values as they come out of queries, and doesn't change them.
  • However, some(!) components choose to render these values in UTC. Specifically the DataTable, and the QueryViewer
  • If they don't have a timezone specified, it assumes they are UTC

Fixes

  • For databases/sources that do not have timezone support, we should give people the option to configure a timezone.
  • The DataTable and QueryViewer should not silently convert dates to UTC
  • (Eventually) We should support allowing users to change the display timezone to one other than their browsers'

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>