Mayil-AI-Sandbox / kuzudb_jan15

MIT License
0 stars 0 forks source link

COPY REL fails on "Unable to find primary key value" under the case of large num of long string primary keys (hashtag2916) #3

Open vikramsubramanian opened 4 months ago

vikramsubramanian commented 4 months ago

I randomly generated 1M strings with the length of 15 to 100 for node table. And also generate a csv file for rel table on top of the 1M strings. The csv files can be found [here](

Run following statements in multiple threads, COPY E will trigger exceptions, while the value can actually be found in the node table through MATCH (n:N) WHERE n.id='xxx' RETURN n.

CREATE NODE TABLE N(id STRING, PRIMARY KEY(id));
CREATE REL TABLE E(FROM N TO N);
COPY N FROM 'string_nodes.csv';
COPY E FROM 'string_edges.csv';

Did a bit investigation: on my local, if I change num threads to 1 (CALL THREADS=1;) before COPY E, it can load successfully. )

mayil-ai[bot] commented 4 months ago

Summary: "COPY REL" fails with "Unable to find primary key value" error when loading a large number of long string primary keys.

Possible Solution

Based on the provided information, the issue seems to be related to a race condition or improper synchronization when importing relationships (COPY E) in a multi-threaded environment. The primary keys are long strings, and the problem does not occur when using a single thread.

To address the issue, consider the following solution:

Remember to thoroughly test the changes in a controlled environment before deploying them to production to ensure that the issue is resolved without introducing new problems.

Code snippets to check