GlareDB / glaredb

GlareDB: An analytics DBMS for distributed data
https://glaredb.com
GNU Affero General Public License v3.0
550 stars 36 forks source link

Should I be able to copy a delta table to lance locally? #2713

Closed greyscaled closed 2 months ago

greyscaled commented 3 months ago

Following doesn't work for FORMAT lance:

> create table hello (a text);
Table created
> copy (SELECT * from hello) to './output.csv';
Copy success
> copy (SELECT * from hello) to './output.lance';
Error: External error: External error: Failed to canonicalize path "./output.lance": No such file or directory (os error 2)
> copy (SELECT * from hello) to './output.lance' FORMAT lance;
Error: External error: External error: Failed to canonicalize path "./output.lance": No such file or directory (os error 2)
> 
tychoish commented 3 months ago

ok, so what's happening here is that, it's treating the path you provide as the prefix where the lance files/tables should live, and when the directory doesn't exist, it errors. (opaquely) if I made the directory before I ran the command it worked. (also lance tables have more than one file, so output.lance may be misleading.

also observed: when the source table is empty nothing is written. (seems correct.)

greyscaled commented 3 months ago

Sweet Ok, got it to work for what i wanted - thanks!

> create table hello (a text);
Table created
> insert into hello values ('hi'), ('how are you');
Inserted 2 rows
> copy (SELECT * from hello) to './lance' FORMAT lance;
Copy success
greyscaled commented 3 months ago

also observed: when the source table is empty nothing is written. (seems correct.)

Yep, this is expected behavior IMO

universalmind303 commented 3 months ago

I think the use of the lance destination as a file in the first example is a bit misleading & maybe we need some documentation around. We should have some clear boundary between file formats (bson, json, csv, parquet, xlsx) and table formats (delta, lance).

greyscaled commented 3 months ago

Yep, it stems from not realizing I needed to provide a dir (and not entirely seeing why based on other uses of the command + error message). Distinguishing between table and file formats is a good point.

tychoish commented 3 months ago

I think the "table" vs "file format" thing breaks down with hive partitioning and globbing, and doesn't feel like a super useful conceptual framework.

We could create directories for lance output if they don't exist, that wouldn't be very absurd.

greyscaled commented 3 months ago

I think the "table" vs "file format" thing breaks down with hive partitioning and globbing

Also good point!

tychoish commented 2 months ago

@greyscaled I agree that this is confusing but I'm not sure what we should do here?

greyscaled commented 2 months ago

@tychoish your comment here:

We could create directories for lance output if they don't exist, that wouldn't be very absurd.

seems sufficient to close this. Additionally we can open an issue to improve documentation on COPY TO.