arq5x / gemini

a lightweight db framework for exploring genetic variation.
http://gemini.readthedocs.org
MIT License
317 stars 119 forks source link

Unexpected results when using OR with comp_hets #946

Closed mmoisse closed 4 years ago

mmoisse commented 4 years ago

I noticed that when using an OR with and without brackets gives different results.

When running this command, I get 4 pairs of comp het variants:

gemini comp_hets --filter " gnomad_af < 0.02 OR gnomad_aff > 0.98" test.db

While the following command only results in 3 pairs of comp het variants:

gemini comp_hets --filter " (gnomad_af < 0.02 OR gnomad_aff > 0.98)" test.db

The difference is that the first command also reports non-exonic variants, while the later doesn't. This is probably due to the fact that behind the scenes some extra where's are defined, and the final query becomes:

select *,gts,gt_types,gt_phases,gt_depths,gt_ref_depths,gt_alt_depths,gt_quals,gt_alt_freqs
FROM variants
WHERE (is_exonic = 1 or impact_severity != 'LOW') AND gnomad_af < 0.02 OR gnomad_aff > 2
ORDER BY chrom,  gene

Probably brackets around args.filter would solve the issue. https://github.com/arq5x/gemini/blob/c3a321eac6d51bdd55fd9c55a09bef6db8aaffd0/gemini/gim.py#L387

mmoisse commented 4 years ago

I noticed similar issues with x_linked_dominant and x_linked_recessive

arq5x commented 4 years ago

Yes, this seems like a clear bug with a simple fix. Thanks for reporting!

jxchong commented 4 years ago

related issue #837

mmoisse commented 4 years ago

I experience similar issues with x_linked_dominant and x_linked_recessive

Probably this line of code

https://github.com/arq5x/gemini/blob/79c5b022556f603ffa27f866ac3d989d8ecd7b5e/gemini/gim.py#L53