dinedal / textql

Execute SQL against structured text like CSV or TSV
MIT License
9.07k stars 300 forks source link

If header=false, then name fields column names. #39

Closed tlehman closed 9 years ago

tlehman commented 9 years ago

This pull request sets the field names of the SQLite database as the values of the corresponding header column name, with the modification that numerical header column names get prepended with 'c'.

Tests

White House budget proposal FY2016

The White House just released the first ever federal budget proposal on github as a few CSV files, however, the receipts.csv file has 72 field names, and it very quickly becomes unwieldy to query it using textql referring to c52 and c68 and keeping track of all those names.

If you check out the budget repository and look at the receipts.csv file, run this command:

$ textql -table-name='receipts' -save-to='budget.db' -header=false -source=receipts.csv 

Then run

$ sqlite3 budget.db 
SQLite version 3.8.7.1 2014-10-29 13:59:56
Enter ".help" for usage hints.
sqlite> .table
receipts
sqlite> .schema receipts
CREATE TABLE receipts (Source_Category_Code TEXT, Source_category_name TEXT, Source_subcategory TEXT, Source_subcategory_name TEXT, Agency_code TEXT, Agency_name TEXT, Bureau_code TEXT, Bureau_name TEXT, Account_code TEXT, Account_name TEXT, Treasury_Agency_code TEXT, On__or_off_budget TEXT, c1962 TEXT, c1963 TEXT, c1964 TEXT, c1965 TEXT, c1966 TEXT, c1967 TEXT, c1968 TEXT, c1969 TEXT, c1970 TEXT, c1971 TEXT, c1972 TEXT, c1973 TEXT, c1974 TEXT, c1975 TEXT, c1976 TEXT, TQ TEXT, c1977 TEXT, c1978 TEXT, c1979 TEXT, c1980 TEXT, c1981 TEXT, c1982 TEXT, c1983 TEXT, c1984 TEXT, c1985 TEXT, c1986 TEXT, c1987 TEXT, c1988 TEXT, c1989 TEXT, c1990 TEXT, c1991 TEXT, c1992 TEXT, c1993 TEXT, c1994 TEXT, c1995 TEXT, c1996 TEXT, c1997 TEXT, c1998 TEXT, c1999 TEXT, c2000 TEXT, c2001 TEXT, c2002 TEXT, c2003 TEXT, c2004 TEXT, c2005 TEXT, c2006 TEXT, c2007 TEXT, c2008 TEXT, c2009 TEXT, c2010 TEXT, c2011 TEXT, c2012 TEXT, c2013 TEXT, c2014 TEXT, c2015 TEXT, c2016 TEXT, c2017 TEXT, c2018 TEXT, c2019 TEXT, c2020 TEXT);
sqlite> 

As you can see, the field names are named after the column names, instead of c1,c2,....,cN, which are much harder to deal with.

Oneliner

$ printf 'foo,bar\n2,3\n5,8\n' | textql -header=false -sql='select foo from tbl where bar=8'
5

Possible alternatives to this approach

If the default behavior of making the field names c1,c2,...,cN is worth preserving, I can make it another flag, such as -fieldsConsecutive=true or something like that.

dinedal commented 9 years ago

Other then the minor modification this looks good

tlehman commented 9 years ago

Tested, works like a charm