Closed dcunning11235 closed 9 years ago
The predecessor to bossquery
was a C++ program called bossfilter
that supported this type of progressive filtering. You can see some usage examples here. However, that code used ROOT trees instead of SQL, so this would require a new approach in bossquery
.
I briefly thought about adding this functionality when writing the initial version of bossquery
but didn't implement it because I imagined that the role of bossquery
is primarily for doing a first pass on the 2.5 million rows in spAll in a reasonable amount of time. Once you have one or more bossquery output files with <100K rows, then it is much more flexible to write a small python script to implement complex selection and query logic than to cover all possible scenarios in bossquery
. I think this applies to your use case, but could be convinced otherwise.
I think the first action item here is to try and write a simple python script that reads your things_wanted.dat
and implements the equivalent of your second command.
I wasn't so much thinking of additional selection criteria so much as having the ability to re-run a query but with different columns selected for output (I may be quibbling on the use of 'criteria'.) I actually threw in the changed filename edited_things_wanted.dat
because it occurred to me at the last second that someone might have processed the file in some way; that wasn't central to my original thought.
So, ignoring that, this amounts to being able to run a query once and then run a new query with, effectively, the same where clause, but based only on the output of the first query. I'm not sure this would be useful for very general tasks ("What are the criteria for this subset of data?" "It's... this data." Probably not a good conversation.) But it's good for quick-and-dirty-and-lazy exploration ("I have blah-blah-blah, let me just add ZWARNING and OBJTYPE to that.")
It would sometimes be convenient to be able to pass results generated by bossquery back to bossquery as
--where
values. E.g.bossquery --what "PLATE,MJD,FIBER,OBJTYPE,CLASS,ZWARNING" --where "criteria!" --save things_wanted.dat
and later come back and be able to
bossquery --what "PLATE,MJD,FIBER,THING_ID,EBOSS0" --where edited_things_wanted.dat
This could be implemented by reading the column headers to build the where clause, and then by querying either (a) by each set of values or (b) using the IN operator in SQL if SQLite supports it.