arq5x / gemini

a lightweight db framework for exploring genetic variation.
http://gemini.readthedocs.org
MIT License
317 stars 119 forks source link

bcolz index for segregation #915

Open 8nb24 opened 5 years ago

8nb24 commented 5 years ago

Hi,

Thank you all for your progress with this tool. I have a gemini database with ~2000 individuals WGS variants. For reanalysis of old cases, we would like to be able to run segregation queries for all individuals in the cohort, i.e. run 'comp_het' on a per-gene basis for every family. For a single gene this takes 20-40 minutes. I was hoping to bcolz index the database to speed this process, however my understanding is that the bcolz index works only with the "gemini query -q ... --gt-filter ..." queries. I am wondering what the back end of the comp_het query looks like (does it run multiple queries with different --gt-filter ?), and if it is possible to utilize the bcolz index for comp_het, denovo, etc.

Thanks

brentp commented 5 years ago

it will use the bcolz index if available, including for the tools like comp_het.

8nb24 commented 5 years ago

Thanks for the response. I ran each of the following queries on a bcolz indexed database: gemini comp_hets --filter "(gene == 'LOXL3')" --use-bcolz Rare.db

gemini query -q "select *, gts.U11802 from variants where (gene == 'LOXL3')" --gt-filter "gt_types.U11802 == HET" --use-bcolz Rare.db

In this case, the second query runs, but the first fails with this error:

usage: gemini [-h] [-v] [--annotation-dir ANNOTATION_DIR] {actionable_mutations,amend,annotate,autosomal_dominant,autosomal_recessive,bcolz_index,browser,burden,comp_hets,db_info,de_novo,dump,examples,fusions,gene_wise,interactions,load,load_chunk,lof_interactions,lof_sieve,mendel_errors,merge_chunks,pathways,qc,query,region,roh,set_somatic,stats,update,windower,x_linked_de_novo,x_linked_dominant,x_linked_recessive} ... gemini: error: unrecognized arguments: --use-bcolz

brentp commented 5 years ago

in the first query, it will detect the presence of the bcolz index and use it.

8nb24 commented 5 years ago

Are there additional arguments that should be called? I got the aforementioned error with the first query.

With the barebones comp_hets, denovo, query: gemini comp_hets --use-bcolz /path/to/db/database.db I get the same error.

This is running on gemini 0.20.1

brentp commented 5 years ago

just run gemini comp_hets without --use-bcolz