BurntSushi / xsv

A fast CSV command line toolkit written in Rust.
The Unlicense
10.29k stars 317 forks source link

Raw value output from fmt #262

Open amake opened 3 years ago

amake commented 3 years ago

I have a CSV where one of the fields is actually JSON:

foo,"[""bar"",""baz""]",bazinga

(Before anyone tells me this is dumb: Yes, I agree. But it's what I have to work with, for various reasons.)

I would like to extract just the JSON column for further processing as JSON, but there doesn't seem to be a way to convince xsv fmt to output the "raw" text value; it is always quoted for CSV purposes.

I imagine it would look something like this, e.g. via an -r or --raw flag:

$ echo 'foo,"[""bar"",""baz""]",bazinga' | xsv select 2 | xsv fmt -r
["bar","baz"]

Or, as a hack, fmt --quote could accept an empty string I guess?

(I know that such output would not be suitable for further processing by xsv, but that's kind of the point.)

For prior art on this, see for instance the --raw-output / -r flag in jq.

BurntSushi commented 3 years ago

I suspect this would be a better fit for the xsv select command itself?

Whether it goes in xsv select or xsv fmt, I think more specification is required. For example, what happens if more than one column or row is selected? What delimiter is used for the raw output?

amake commented 3 years ago

I suspect this would be a better fit for the xsv select command itself?

Sure, that would be fine.

Whether it goes in xsv select or xsv fmt, I think more specification is required. For example, what happens if more than one column or row is selected?

Given the file my.csv below:

foo,"[""bar"",""baz""]",bazinga
buzz,"[""bang"",""boom""]",blammo

For selecting a single column with multiple rows, I would expect one value per line, like:

$ cat my.csv | xsv select 2 --raw
["bar","baz"]
["bang","boom"]

A problem with this would be when the quoted values themselves contain newlines; then the output will probably be very hard to use in a meaningful way. For my purposes it would be nice to somehow escape in-value newlines e.g. as \n but I'm not sure that works without making assumptions about the content or the downstream use case.

(Even if in-value newlines break things, I think raw output could still be useful for when you are sure you don't have in-value newlines, which for me is pretty often.)

Multiple columns: To be honest I hadn't thought about this. I'm not sure what to expect; perhaps one value per line but they are interleaved like:

$ cat my.csv | xsv select 1,2 --raw
foo
["bar","baz"]
buzz
["bang","boom"]

What delimiter is used for the raw output?

Ultimately my aim is to pipe things to other line-wise programs, so the natural answer is U+000A (line feed).

JLHasson commented 2 years ago

I also think this would be a great use case to add for xsv.

I had similar data and did the following to work around:

xsv select col_name file.csv | sed -E 's/""/"/g; s/^"//g; s/"$//g' | tail -n +2 | jq .

Basically replace the quotes with sed and skip the header row with tail

fluffysquirrels commented 1 year ago

In case other people find this via Google and need a workaround, this is mine using qsv and jq:

< my.csv qsv tojsonl | jq '.RAW_JSON' -r

I would still love to see this feature in xsv.