antonycourtney / tad

A desktop application for viewing and analyzing tabular data
http://tadviewer.com
MIT License
3.19k stars 119 forks source link

CSV parsing not recognizing first rows as headers; Adding an extra column #189

Open abdullahdevrel opened 1 year ago

abdullahdevrel commented 1 year ago

Tad is not recognizing the first row as header. Additionally, it is adding one extra column called, Rec.

Source twitter thread: https://twitter.com/antonycourtney/status/1605964183941500928 Operating System: Windows 10 Tad Version: 0.11.0

CSV datasets:

Output

Company Sample Dataset

image

ASN Sample Dataset

image

annie-maria commented 1 year ago

I'm having the same problem. Mac Monterey - 12.6.3 - Tad 0.11.0

abdullahdevrel commented 1 year ago

@annie-maria I talked with the author @antonycourtney about this on Twitter. It is a known problem with DuckDB's auto-read-csv function. When all the columns in a dataset are of the TEXT type, duckdb's CSV parser fails to recognize the first row as the header column.

daviewales commented 1 year ago

My feeling is that most CSV files would have a header. Perhaps this line can be changed as follows: https://github.com/antonycourtney/tad/blob/ef52830f1560eadc0fbc58d2b887faef6e781f70/packages/reltab-duckdb/src/csvimport.ts#L64

const query = `CREATE OR REPLACE TABLE ${tableName} AS SELECT * FROM read_csv_auto('${filePath}', header=True)`;

And an option could be added to change the header setting.

tristanhoy commented 1 year ago

As a workaround, if you're the one generating the CSV you can just add an extra column and stick the number 1 in it