arq5x / gemini

a lightweight db framework for exploring genetic variation.
http://gemini.readthedocs.org
MIT License
318 stars 120 forks source link

Improve comp_hets family-based phasing with one parent available #747

Closed jxchong closed 8 years ago

jxchong commented 8 years ago

Kept forgetting to write this one up.

Now that family-based phasing is implemented, single-parent families can/should also be phased (i.e. when only one parent is available).

The parent and child should both be het at only one of the two candidate variant sites in each comphet variant pair because the other variant that is het in the child must come from the other non-sequenced parent.

jxchong commented 8 years ago

just checking, did this actually end up in today's 0.19.0 release? (no big deal)

brentp commented 8 years ago

no, I want to get this added, but haven't done so yet. I'll plan it for next point-release to go along with any problems that might be found with 0.19.0

brentp commented 8 years ago

So, I just wrote a test for this thinking to implement and it seems to work.

with:

it gives a candidate with priority 3

with:

it gives a candidate with priority 2

with:

it is not a candidate.

Do you seen anything to add to this?

brentp commented 8 years ago

cc @jxchong

jxchong commented 8 years ago

I think this is right, it'll reduce the output of max priority 2 relative to what it used to be, but that should be ok (it's more "correct"/consistent with the rules the way they are described in the documentation)

jxchong commented 8 years ago

I have a test case with real data of this I'm working on right now (no DNA on one parent))

brentp commented 8 years ago

ok. let me know any ideas/problems.

jxchong commented 8 years ago

I can run it if it's available if I run gemini update --devel?

brentp commented 8 years ago

it is in 0.19.0 so if you have that, should be good, otherwise, yes, run update.

jxchong commented 8 years ago

Not quite right yet. These two variants show up as a compound het pair with --max-priority 2

affectedchild1, parent, affectedchild2 T/C,T/T,T/C A/C,A/A,A/C

both children inherited the alt allele from the mother, but at both variant sites -- should only inherit one of the two alt alleles/variants from the mother

$ gemini update --devel --nodata
Fetching package metadata .........
Solving package specifications: ..........

# All requested packages already installed.
# packages in environment at /nfs/home/jxchong/gemini-data/anaconda:
#
gemini                    0.19.0                   py27_0    bioconda
pip                       8.1.2                    py27_0
Collecting git+https://github.com/arq5x/gemini.git
  Cloning https://github.com/arq5x/gemini.git to /tmp/pip-OW4wjV-build
Installing collected packages: gemini
  Found existing installation: gemini 0.19.0
    Uninstalling gemini-0.19.0:
      Successfully uninstalled gemini-0.19.0
  Running setup.py install for gemini ... done
Successfully installed gemini-0.19.0
Gemini upgraded to latest version
From https://github.com/arq5x/gemini
 * branch            master     -> FETCH_HEAD
Already up-to-date.
HEAD is now at 69f6741... bump cyvcf2 reqs
brentp commented 8 years ago

thanks for the test-case. I'm not getting this as a candidate. Are you sure you have all the affection statuses set correctly?

and can you tell me the output of:

conda list | grep inheritance
jxchong commented 8 years ago
$ ~/gemini-tools/bin/gemini_conda list | grep inheritance
inheritance               0.0.7                    py27_0    bioconda
$ gemini query -q "select * from samples" xxxx.db
1   fam1    4   0   5   2   2   proband
2   fam1    5   0   0   2   1   mother
3   fam1    7   0   5   1   2   sibling
brentp commented 8 years ago

Thanks. I can recreate now. I had a bad test.

Latest is 0.0.9, but it's not fixed there either. Working on it now.

brentp commented 8 years ago

@jxchong ,

I think the most comprehensive and correct constraint to add is that neither parent (in this case it's just a single parent), can be HOM_REF at both sites or HOM_ALT at both sites?

does that seem right?

with that change, I do get your test-case to behave as expected (not a candidate).

brentp commented 8 years ago

Assuming that is correct, I pushed a fix for this to github. You can get via:

pip install  -U https://github.com/brentp/inheritance/archive/master.zip

and then conda list | grep inheritance should show version 0.1.0.

We'll put this into 0.19.1

jxchong commented 8 years ago

Er doh... that's right and way simpler. Also neither parent can be HET at both sites unless going down to max-priority 3 (though I think you have this constraint already)

brentp commented 8 years ago

cheers. Yes, we have the HET thing in place. Seems like this change could remove a fair number of spurious candidates. Let me know if you have any other issues.

jxchong commented 8 years ago

Also need to change the official docs for these changes to max-priority 2 http://gemini.readthedocs.io/en/latest/content/tools.html#comp-hets-identifying-potential-compound-heterozygotes

brentp commented 8 years ago

done.