dinedal / textql

Execute SQL against structured text like CSV or TSV
MIT License
9.05k stars 300 forks source link

Import multiple files to single table #75

Open bsilverm opened 6 years ago

bsilverm commented 6 years ago

Is there a way to import multiple files in to the same table? My data is split in to multiple files due to its size. Running a command like this results in two tables:

textql -console -dlm=0x7c -header -sql "Select count() from myLog" myLog1.txt myLog2.txt

Running .schema in sqlite results in

CREATE TABLE [myLog]..
CREATE TABLE [myLog2]..

Ideally there would be one table containing the data from both files.

Thanks!

runeimp commented 5 years ago

Something like one of the following should work:

textql -console -dlm=0x7c -header -sql "SELECT (SELECT count() FROM myLog1) + (SELECT count() FROM myLog1) LIMIT 1" myLog1.txt myLog2.txt

or

textql -dlm=0x7c -header -output-file myLog.txt -sql 'SELECT * FROM myLog1 OUTER JOIN myLog1.ID = myLog2.ID' myLog1.txt myLog2.txt
textql -console -dlm=0x7c -sql "SELECT count() FROM myLog" myLog.txt
abathur commented 5 years ago

If you can make some assumptions about the file and any header rows, it's probably going to be faster to do this before the logs hit textql. Not sure how often you work with files like this, but it might be meaningful since you say the data is split over multiple files due to size.

If none of the logs have headers, it could be as simple as using cat to join them into a new file, or pipe directly to textql. If your logs do have headers, you can use head and tail. I think the following example of piping them directly into textql should work in Unix or WSL/cygwin...

# if the logs have a stable schema, just stick the header in a file by itself
head -n1 myLog1.txt > myLog.header

tail -q -n +2 myLog*.txt | cat myLog.header - | textql -sql "select count(*) from stdin"
# Breakdown of above:
# 1. extract all but the first line of the log files with tail and pipe to cat
#   (-q flag removes filename separators, effectively concatenating them)
# 2. use cat to concatenate the header file with the output from tail (indicated by the "-" argument) and pipe to textql
# 3. run textql query against the "stdin" table. 
# (You can leave out "from stdin" for simple queries, i.e. `textql -sql "select *"` but I wanted to explicitly call it out here)