BurntSushi / xsv

A fast CSV command line toolkit written in Rust.
The Unlicense
10.29k stars 317 forks source link

`xsv table` get broken visually by newline #275

Open visig9 opened 3 years ago

visig9 commented 3 years ago

Problem

I have a blog.csv look like this:

title,content,date,auther
test newline,"first line
second line",2020-01-01,Kyle
other post,"Hello World",2020-01-02,John

xsv table blog.csv give me some broken output:

$ xsv table blog.csv
title         content  date  auther
test newline  "first line
second line"  2020-01-01   Kyle
other post    Hello World  2020-01-02  John

If I use -c to bypass the newline then everything just fine:

$ xsv table blog.csv -c10
title          content        date        auther
test newli...  first line...  2020-01-01  Kyle
other post     Hello Worl...  2020-01-02  John

But this workaround not work in some situation, for example, blog2.csv has a field leading by newline:

title,content,date,auther
test newline2,"
second line",2020-01-01,Kyle
other post,"Hello World",2020-01-02,John

Just not work.

What I Want

I think xsv table command is designed for human readable. If so, print a newline and let table broken visually may not a good choice.

To deal this problem, I guess we have two approach:

  1. replace all newline in field by space character ( ).
  2. Only print the first line, strip all the things after the first newline character. (this approach not need to change original data, only strip, so more precise, but may less useful in some case).

I prefer first approach because It's more useful if I want to skim some data field which leading with a newline, but both are good enough.

lelandbatey commented 2 years ago

As a stopgap workaround for this problem, we can preprocess our CSVs to remove/replace newlines in all cells before feeding them to xsv table. If you'd like to do this, I've created a minimal implementation of this as a Python script, remove_csv_newlines.py which you can find at the following Gist: https://gist.github.com/lelandbatey/26fb0be6b7891acdd000e1c01089ed31

Given an example CSV file such as this one:

headerOne,header2
foo,"baz
bar"

With an exact hex content like this:

$ xxd -c 32 ./example.csv 
00000000: 6865 6164 6572 4f6e 652c 6865 6164 6572 320a 666f 6f2c 2262 617a 0a62 6172 220a  headerOne,header2.foo,"baz.bar".

The xsv table command is broken up by the newline in the cell and looks weird:

$ xsv table ./example.csv 
headerOne  header2
foo        "baz
bar"

But, if you preprocess the CSV to replace the newlines (e.g. with remove_csv_newlines.py), then the result of xsv table will look better:

$ cat ./example.csv | ./remove_csv_newlines.py | xsv table
headerOne  header2
foo        baz\nbar

Ultimately, I hope xsv table sees some kind of first-party implementation for this newline handling, but until then, we can use stopgaps like remove_csv_newlines.py