BurntSushi / xsv

A fast CSV command line toolkit written in Rust.
The Unlicense
10.41k stars 323 forks source link

fixlengths -- insert extra commas not at end #323

Open ggrothendieck opened 1 year ago

ggrothendieck commented 1 year ago

One reason to need fixlengths is that there are multiple subfields for one field without quoting it. If that field is the last then fixlengths can be used but not if it is the second last, say. What would be nice is if the position of the insertion point for the extra commas could be specified. For example -1 would mean the extra comma(s) would be inserted at the last comma. If there are no commas then the commas are still added at the end.

Here is an example of sample input taken from https://stackoverflow.com/questions/76423878/reading-a-csv-file-into-r-which-contains-comma-separated-values-in-single-obser/76427295#76427295

 clothes,colours,size 
 shirt,blue,green,grey,small
 shirt,yellow,black,small
 shorts,blue,medium
 shorts,black,large

The corresponding output would be

clothes,colour1,colour2,colour3,size
shirt,blue,green,grey,small
shirt,yellow,black,,small
shorts,blue,,,medium
shorts,black,,,large

although I think it would be sufficient if it did not deal with the header since that can always be skipped by whatever program is reading it in.

To be clear, this gawk program from same source would accept that input and produce that output for this particular example.

# To run: gawk -f process.awk myfile.csv > myfile2.csv
# To configure: edit header= line as needed
BEGIN { 
    header = "clothes,colour1,colour2,colour3,size" 

    commas = gensub(/[^,]/, "", "g", header)
    ncommas = length(commas)
    FS = OFS = ","
}
NR == 1 { print header; next } # skip input header & use header variable instead
{ 
  if (NF > 1) print gensub(",", substr(commas, 1, ncommas - NF + 2), NF-1)
  else print $0 commas
}