duckdb / duckdb

DuckDB is an analytical in-process SQL database management system
http://www.duckdb.org
MIT License
22.12k stars 1.77k forks source link

Correlated sub-query for linear interpolation fails with fatal error #9864

Closed coroa closed 1 day ago

coroa commented 9 months ago

What happens?

When trying to fill in null values in a small table with linear interpolation, then a fatal error

INTERNAL Error: Failed to bind column reference "limit_rownum" [30.0] (bindings: [8.0 8.1 8.2 28.0])

is raised and invalidates the in-memory database.

The same SQL query runs fine on Postgres: https://dbfiddle.uk/3anzk06a

I am relatively new to writing more complex SQL queries, apologies in advance, if i am making obvious mistakes.

To Reproduce

CREATE TABLE df (x NUMERIC, y NUMERIC);
INSERT INTO df VALUES (0, 2), (1, NULL), (2, 4), (3, 5), (4, NULL);

so that:

duckdb> FROM df;
┌───────┬───────┐
│ x     ┆ y     │
╞═══════╪═══════╡
│ 0.000 ┆ 2.000 │
│ 1.000 ┆       │
│ 2.000 ┆ 4.000 │
│ 3.000 ┆ 5.000 │
│ 4.000 ┆       │
└───────┴───────┘
SELECT
    x,
    COALESCE(
        y,
        (
            SELECT 
                prev.y + ( (next.y - prev.y) * (parent.x - prev.x) / (next.x - prev.x) )
            FROM
                ( SELECT x, y FROM df WHERE x <= parent.x and y is not null ORDER BY x DESC LIMIT 1 ) AS prev
            CROSS JOIN
                ( SELECT x, y FROM df WHERE x >= parent.x and y is not null ORDER BY x ASC LIMIT 1 ) AS next
        )
    ) AS y
FROM
    df parent

then:

INTERNAL Error: Failed to bind column reference "limit_rownum" [30.0] (bindings: [8.0 8.1 8.2 28.0])

and

duckdb> FROM df;
FATAL Error: Failed: database has been invalidated because of a previous fatal error. The database must be restarted prior to being used again.
Original error: "INTERNAL Error: Failed to bind column reference "limit_rownum" [30.0] (bindings: [8.0 8.1 8.2 28.0])"

OS:

DuckDB Web Shell

DuckDB Version:

0.9.2

DuckDB Client:

WASM

Full Name:

Jonas Hörsch

Affiliation:

Climate Analytics

Have you tried this on the latest main branch?

I have tested only with a release build on website and with the python client.

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

coroa commented 9 months ago

This seems to be related to https://github.com/duckdb/duckdb/issues/9720 .

At least it can be worked around in a similar manner by including the row_number windowing function into the cross-joined sub-queries:

SELECT
    x,
    COALESCE(
        y,
        (
            SELECT 
                prev.y + ( (next.y - prev.y) * (parent.x - prev.x) / (next.x - prev.x) )
            FROM
                ( SELECT x, y, ROW_NUMBER() OVER (ORDER BY x DESC) as rn FROM df WHERE x <= parent.x and y is not null ) AS prev
            CROSS JOIN
                ( SELECT x, y, ROW_NUMBER() OVER (ORDER BY x ASC) as rn FROM df WHERE x >= parent.x and y is not null ) AS next
            WHERE
                prev.rn = 1 AND next.rn = 1
        )
    ) AS y
FROM
    df parent

Thanks to https://stackoverflow.com/a/77588520/2873952 for realizing the link and workaround.

github-actions[bot] commented 5 months ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days.

github-actions[bot] commented 1 month ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days.

coroa commented 1 month ago

The error does seem to have been fixed for 1.0.0. At least i cannot reproduce it anymore on the latest version.