BurntSushi / xsv

A fast CSV command line toolkit written in Rust.
The Unlicense
10.35k stars 324 forks source link

Idea: Literally embed SQLite into xsv #295

Closed infogulch closed 2 years ago

infogulch commented 2 years ago

It sounds crazy, but hear me out.

More database-like functionalities are often requested. Features like arbitrary expressions, type awareness, advanced filtering and joining, aggregations, pivot/unpivot, etc. Sometimes subsets of these features are reasonable to consider for inclusion in xsv, other times the conclusion to such requests is to use a more fully-featured database like SQLite directly. While SQLite does support csv, it's only via an extension that must be compiled and loaded separately, and there's a lot of boilerplate to get it up and running. Enter the proposal:

What if xsv embedded SQLite and exposed a new subcommand that supports executing SQLite queries over csv files directly? I don't think such a tool exists today.

xsv query 'select a, b, a+b as c from ab.csv where a > 100' ab.csv > abc.csv

This would be a built-in escape hatch that instantly resolves the need to implement advanced database-like functionality, upgrading xsv from "powerful slicing tool for broad-stroke csv manipulation" to "complete csv management and analysis package". All told, it may be easier to implement this than all the advanced features currently considered and open in the issue tracker right now.

Some reasonable objections:

  1. xsv would no longer be pure Rust, and could significantly increase the size of the xsv binary. This may be mitigated by wrapping it in a default-off cargo feature.
  2. It would convert these features from "implementation work" into "build system work", which is not always the best tradeoff. Especially for cross-platform support.
  3. Maybe this would be better situated as a complementary third party tool, instead of being integrated into xsv directly.
  4. Memory usage of the query subcommand would be outside xsv's control, and SQLite may try to load the whole csv file into memory. That said, the best tool to split a large csv into many smaller ones for more advanced processing is quite nearby...

Thoughts?

BurntSushi commented 2 years ago

I think this should be some other tool. I'm quite certain they already exist as well, although I don't have a link handy.

But yeah, I'm not one for this kind of scope increase. Another alternative to the feature requests you reference is to only implement a subset of them. Or implement simpler versions of them.

It was never my intent for xsv to support arbitrary relational algebra, like that found in relational databases. In my view, if you need that, you should just use the relational database and cut out the middle man. Write a wrapper script if you must if it isntoo inconvenient today.