Tux / Text-CSV_XS

perl5 module for composition and decomposition of comma-separated values
17 stars 20 forks source link

Multiple spaces as a separator? #36

Closed agguser closed 2 years ago

agguser commented 2 years ago

Is this possible? (e.g. to parse output of ps shell command.) sep_char=>' ', allow_whitespace=>1 parse to multiple empty columns.

Tux commented 2 years ago

https://github.com/Tux/Text-CSV_XS/blob/master/doc/CSV_XS.md#allow_whitespace Quote:

When this option is set to true, the whitespace (TAB's and SPACE's) surrounding the separation character is removed when parsing. If either TAB or SPACE is one of the three characters sep_char, quote_char, or escape_char it will not be considered whitespace.

So, no, this is not possible

agguser commented 2 years ago

So I have to reformat the input first (e.g. with csvformat -d' ' -D' ' -S or csvquote -d' ' | sed -E 's/^ +| +$//g; s/ +/ /g' | csvquote -u -d' ').

Using spaces to separate optional-quoted fields is quite readable and common (e.g. shell command line, aligned command output (e.g. ls, ps), Lisp s-expression). So I hope that it would be supported. If not, you may close this issue.

Tux commented 2 years ago

I know the use-case, and I have wanted things like this myself too, but as you already showed: there are other ways to get to something that is valid CSV and does not require massive reconstruction inside the CSV parser with loads of new possibilities to get it wrong. Every new option will have a small impact on speed, and this use case isn't worth it.

The way to tackle these quests is to find other tools or options to get to the required data. e.g. for ps you could use Proc::ProcessTable and implement all options you like from ps. My ps done that way is called px and it has an option that ps does not have: --csv.

For ls, all builtins in perl already suffice: find, stat, lstat, readdir, glob, … You just have to take care with shell-interaction and wildcards.

For df, switch to di, which has -c or --csv-output and -C or --csv-tabs as options that df does not have.