charmbracelet / gum

A tool for glamorous shell scripts 🎀
MIT License
18.08k stars 340 forks source link

Tab separated csv file throw an error invalid data provided #345

Open polRk opened 1 year ago

polRk commented 1 year ago

Describe the bug I'm trying to load big csv file without headers and tab separated data. got an error invalid data provided

To Reproduce Steps to reproduce the behavior:

  1. Create csv file
  2. Inset tab separated data
  3. execute in terminal gum table < data.csv
  4. See error invalid data provided

Expected behavior Works fine

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

Additional context Cannot attach file under nda

acidghost commented 11 months ago

This can actually already be achieved (version 0.12).

gum table --separator=$'\t' <table/comma.tsv
gennaro-tedesco commented 11 months ago

I am running into the same error despite providing a different separator (it seems that gum table just does not see it).

acidghost commented 11 months ago

Do some fields in the input have maybe some quotes? I think that those need to be properly escaped or delimit the entire field, e.g.:

field1<TAB>field2<TAB>"field3"

This would not work:

field1<TAB>field2<TAB>field "three"

You could try a simple Go program to parse CSV that does the same as gum: https://github.com/charmbracelet/gum/blob/01a66511a142b76f79120e295be6ba6be61618d4/table/command.go#L19-L58

gennaro-tedesco commented 11 months ago

Reproducing the error I have narrowed it down to a specific case where at least one line contains an "empty" or null value, which is probably interpreted as \n rather than space (I am not sure, I would have to check the exact raw escape codes of the original text data) - so no quotes rather empty values.

Perhaps there is a check where it is expected that each single row contains the same number of values, i. e. the same number of alternating spaces?

acidghost commented 11 months ago

Perhaps there is a check where it is expected that each single row contains the same number of values, i. e. the same number of alternating spaces?

I think that is required by the Go library parsing CSV (see https://pkg.go.dev/encoding/csv#Reader); the spec it implements seems to suggest that rows should have the same number of fields (https://www.rfc-editor.org/rfc/rfc4180.html).

acidghost commented 11 months ago

Actually seems to be possible to have a variable number of fields by passing a negative value to the Reader field FieldsPerRecord.

Would be nice to be able to control that via gum's CLI as well as the LazyQuotes field.