cwida / duckpgq-extension

DuckDB extension that adds support for SQL/PGQ
https://duckpgq.notion.site/b8ac652667964f958bfada1c3e53f1bb?v=3b47a8d44bdf4e0c8b503bf23f1b76f2
MIT License
60 stars 6 forks source link

Non unique vertices with path-finding cause internal error #139

Closed bburky closed 1 week ago

bburky commented 2 weeks ago

Non unique vertex values (pointed to by DESTINATION KEY (...) REFERENCES ) in graph cause ANY SHORTEST v-[e]-> +(v) queries to fail with:

INTERNAL Error: Attempted to access index 1 within vector of size 1 This error signals an assertion failure within DuckDB. This usually occurs due to unexpected conditions or errors in the program's logic. For more information, see https://duckdb.org/docs/dev/internal_errors

Tested on macOS with duckdb v1.0.0 and the current community version of duckpgq (6c06589). Also reproduced with a build from duckpgq-extension source at commit 6d483c583588e353f64aae2c6bdd093c482b9776.

I didn't see any other currently open "INTERNAL Error" issues, so I'm opened this issue. This may be a dupe of the "Primary keys are not unique" issue in #26?

I'm guessing I did something wrong (is the REFERENCES target required to be unique? Edges must point to a single specific vertex? I was able to resolve this by fixing my data and using unique ids for edges) I didn't expect this error to cause be a fatal internal error though.

This project looks really promising, I was just hoping to try it out and play with SQL/PGQ a bit. The integration with duckdb is very nice.

Minimal reproduction:

CREATE TABLE v (x VARCHAR);
INSERT INTO v VALUES ('a');
INSERT INTO v VALUES ('b');
INSERT INTO v VALUES ('b');

CREATE TABLE e (x1 VARCHAR, x2 VARCHAR);
INSERT INTO e VALUES ('a', 'b');

LOAD duckpgq;

CREATE PROPERTY GRAPH g
VERTEX TABLES (
    v
)
EDGE TABLES (
    e
        SOURCE KEY (x1) REFERENCES v (x)
        DESTINATION KEY (x2) REFERENCES v (x)
);

-- v-[e]->(v) has no error:
-- Output has duplicate `x` records with the value `b` returned as expected. They can be distinguished by rowid in vertices()
FROM GRAPH_TABLE(g
  MATCH p =(v1:v)-[e:e]->(v2:v)
  COLUMNS (vertices(p), v2.x)
);

-- ANY SHORTEST v-[e]->(v) has no error:
-- Output again has duplicate `x` records are returned as expected
FROM GRAPH_TABLE(g
  MATCH p = ANY SHORTEST (v1:v)-[e:e]->(v2:v)
  COLUMNS (path_length(p), vertices(p), v2.x)
);

-- ANY SHORTEST v-[e]-> +(v) fails with "INTERNAL Error: Attempted to access index 1 within vector of size 1"
FROM GRAPH_TABLE(g
  MATCH p = ANY SHORTEST (v1:v)-[e:e]-> +(v2:v)
  COLUMNS (path_length(p), vertices(p), v2.x)
);

-- ANY SHORTEST v-[e]->{1,2}(v) also fails with "INTERNAL Error: Attempted to access index 1 within vector of size 1"
FROM GRAPH_TABLE(g
  MATCH p = ANY SHORTEST (v1:v)-[e:e]->{1,2}(v2:v)
  COLUMNS (path_length(p), vertices(p), v2.x)
);
Dtenwolde commented 2 weeks ago

Hi, thank you for submitting the issue :) I was able to reproduce it using your example. The duplicate IDs in the vertex table seem to lead to an incorrect creation for the CSR index we use for path-finding. I always assumed unique IDs in the vertex table in my tests. If this is not the case it should at least give a nicer error or remove duplicate vertices. I'll look into it :)

bburky commented 2 weeks ago

Ok, that's what I figured after playing with it a bit more.

Hopefully you'll find a good solution. One of the nice things about the duckdb integration is I could go directly from loading JSON to querying that data in a graph. Sadly duckdb doesn't seem to provide an easy way to put a unique constraint during JSON loading (or when I turn it into a table with CREATE TABLE AS)? I guess I could add a UNIQUE INDEX after creating the table, that may not too bad.