Open vikramsubramanian opened 4 months ago
Summary: Unable to copy .nt file for storage, but works when renamed to .ttl.
Based on the provided information, the issue is that the Kùzu database system does not recognize .nt
(N-Triples) files as a supported RDF format when using the copy
command, even though it should. The workaround of renaming .nt
files to .ttl
(Turtle) suggests that the system supports Turtle files but not N-Triples, despite both being valid RDF formats.
To resolve the issue:
copy
command.copy
command implementation, find where the file extension is being validated..nt
as a supported file type for RDF data.third_party/serd/src/n3.c
and third_party/serd/src/writer.c
) to confirm it supports .nt
files.copy
command correctly invokes the parser for .nt
files..nt
files to prevent regressions in the future.Note: The provided code snippets do not contain the exact location where the copy
command is implemented or where the file extension validation occurs. You will need to search the codebase for the relevant sections to apply the above solution.
This file contains code for writing N-Triples, which is relevant to enabling '.nt' file storage.
This file contains code for parsing RDF syntaxes, which may need to be reviewed to ensure '.nt' files are supported.
src/include/common/keyword/rdf_keyword.h
This file contains RDF-related keywords and may need to be updated to include '.nt' file support.
N-Triples files are valid Turtle files. They are the simplest of RDF formats where each triple is written one one line and without any prefix or base directives/shortening (so each IRI is written as full IRIs) and no grouping of triples by subject as in Turtle. Currently if I do this, I get an error:
However if I rename the file to
latest-lexemes-nt.ttl
then the loading works. I also tested that the Turtle and N-Triple version of the latest Wikidata lexemes dataset gives similar counts (not exact but very close).)