cube2222 / octosql

OctoSQL is a query tool that allows you to join, analyse and transform data from multiple databases and file formats using SQL.
Mozilla Public License 2.0
4.76k stars 200 forks source link

Zero-config CSV queries #135

Closed nfultz closed 2 years ago

nfultz commented 5 years ago

I'm not sure how feasible this may be, but the old logparser program from Microsoft supported queries like

Select * from test.csv

And it would read from test.csv, without needing any config files.

cube2222 commented 5 years ago

Hey, I think some thought must be put into if this wouldn't cause any trouble with ambivalent meanings further down the road.

However, what we want to support for sure is allowing to input csv / json data via standard input for queries like:

Select * from stdin

Maybe a table valued function would be a good idea? That would be very easy to add after the recent parser customizations.

Select * from loadfile(type => 'csv', path => 'test.csv')

Not sure though if it's a good idea to provide so many ways to solve a single problem (open a csv file)

nfultz commented 5 years ago
Select * from stdin

This would be fantastic! One problem would be, how to tell what format stdin is providing? I can imagine people piping in json at least as often CSV. logparser dealt with that by having a gazillion -i:format flags, but I don't find that very elegant.

Another option would be to pipe in a config from stdin, something like this may not be too bad:

octosql -c - "Select * from tbl"

Maybe a table valued function would be a good idea? That would be very easy to add after the recent parser customizations.

Select * from loadfile(type => 'csv', path => 'test.csv')

Not sure though if it's a good idea to provide so many ways to solve a single problem (open a csv file)

This would be a very general, but doesn't seem very standard. I'll take a look at how Hive, Presto try to deal with this, maybe there's some good ideas there.

cube2222 commented 5 years ago

I think that discerning between csv and json wouldn't be too hard :) and I don't think there are much more formats people would like to pipe in through stdin (Excel?)

tooolbox commented 3 years ago

I very much like this idea. Having the ability to arbitrarily query and join a few files without needing a config file would be very convenient.

Maybe a table valued function would be a good idea? That would be very easy to add after the recent parser customizations.

I think this is a decent idea, but I don't see why the shorthand select * from test.csv couldn't work. If need be, you could require it to obviously be a path such as select * from ./test.csv Also something like

select * from `./test.xlsx`.Sheet1

for Excel (using the MySQL quoting there).

If you do go the function route, I would imagine the type param could be inferred from the file extension, so you shouldn't have to write it most of the time. And for Excel, the sheet name and any other data could be specified as function params.

cube2222 commented 3 years ago

I'm actually working on a rewrite of OctoSQL on the redesign branch and this syntax is supported there. Examples:

octosql "SELECT * FROM test.json"
octosql "SELECT * FROM ./test/test.json"

I'll probably release an initial version in the following months.

tooolbox commented 3 years ago

That's very exciting, thanks for sharing!

tooolbox commented 2 years ago

Just curious, it seems like you released this new zero-config, but I don't see Excel support (even though I see previous commits for it). Was this lost in the shuffle?

cube2222 commented 2 years ago

@tooolbox Sorry, lost track of this issue.

Yeah, zero-config is now here and excel support is gone for now cause I haven't yet had the time to port it.

That said, I'm planning to bring it back soon. Should be very little work to port it over as a plugin.

Closing this as zero-config csv queries are here, but I'll open another issue for tracking Excel support.