benhoyt / goawk

A POSIX-compliant AWK interpreter written in Go, with CSV support
https://benhoyt.com/writings/goawk/
MIT License
1.93k stars 84 forks source link

Add helper functions for CSV processing #125

Open benhoyt opened 2 years ago

benhoyt commented 2 years ago

It'd be good to add a library of various functions to help with processing CSV files (or other tabular data, it wouldn't be limited to CSV). For example:

We could start by making this a simple AWK library that you include, eg goawk -f lib.awk -f prog.awk (prepend/append the library to the source when using the Go API).

When we want to add them as builtins to GoAWK, we should do it in a backwards-compatible way (i.e., not make them keywords like the other builtins, but if the user redefines a function or variable with that same name, that takes precedence).

vielmetti commented 2 years ago

If you're thinking about helper functions for CSV processing, it would be worthwhile to look at "csvkit"

https://csvkit.readthedocs.io/en/latest/

which is a set of command line tools for processing CSV data. I'm pretty sure that all the simple tools have direct implementations in goawk, but some don't, and this might be inspiration.

(thanks for goawk, always nice to see a favorite old language get a modern implementation)

benhoyt commented 2 years ago

@vielmetti Thanks for that. Yeah, I've looked at csvkit some when thinking about this (see https://github.com/benhoyt/goawk/blob/master/csv.md#examples-based-on-csvkit). Select and cut and reorder are fairly straight-forward with the @ operator, and the functions in https://github.com/benhoyt/goawk/pull/127 augment that with field insertion/deletion when you need that.

Some things that csvkit can do probably aren't going to be included though, for example, converting to JSON. Or sorting -- that just doesn't fit the row-by-row AWK model very well.

janxkoci commented 1 year ago

You could also look at Miller for inspiration. Miller is heavily inspired by awk and the unix toolbox, but adds support for formats like CSV, JSON, etc. Miller is also written in Go, so you could even borrow some code for other parts of your project, like buffers and such.

janxkoci commented 2 months ago

PS: there is also csvtk in Go!