brentp / slivar

genetic variant expressions, annotation, and filtering for great good.
MIT License
249 stars 23 forks source link

Re: multi-generational pedigree #38

Closed asapb closed 5 years ago

asapb commented 5 years ago

Hi Brent,

Am reading about how to use a multi-generational pedigree with slivar and am wondering about your example for detecting de novo mutations in generation F1 that are transmitted to F2.

The documentation for groups in slivar shows what the alias for a multi-generational pedigree could look like where each family have different number of children:

#f1 spouse gma gpa kids
s1  s2     s3  s4  s5,s6,s7
s8  s9     s10 s11 s12,s13
s14 s15    s16,s17,s18,s19,s20,s21,s22,s23,s24

In the third row (where f1 is s14 and the spouse is s15) it seems like the array with the kids are s16 to s24, but how do you account for the grandparents (gma and gpa)?

Are those entries supposed to be empty, or have a tab or a white space, or is this a typo and s16 is really supposed to be the grandma (gma), s17 really supposed to be grandpa (gpa) and the children actually only from s18 to s24?

I ask because I've tried running with trios where I don't have each individual for every trio and I get a fatal error (please see below). I've tried leaving it blank, I've tried adding a white space, adding a tab or a tab followed by a white space but I get the same error message each time saying I'm having an index error when I do that.

My group (defined in alias) looks like this:

#affected    carrier    healthy
s1      s2    s3
s4      s5
s6      s7
s8
s9

I run slivar with the following command:

slivar expr \
    --pass-only \ 
    --vcf $vcf \
    --ped $ped \
    --gnotate $gnotation \
    --alias $alias \
    --js $js \
    --info "$info" \
    --group-expr "$group"

and the error message says:

...
system.nim(3059)         sysFatal
Error: unhandled exception: index 0 not in 0 .. -1 [IndexError]

It's worth pointing out that I've successfully run slivar with the above command when I change the group (defined by alias) to only containing the first row with the complete trio:

#affected    carrier    healthy
s1      s2    s3

unfortunately my groups do not have information for each subcategory (carrier and healthy) for each trio, but I was still hoping I could use slivar to filter my VCF files.

However, I realise I am probably trying to use slivar to do something it was never intended to do, so it might not be possible.

Thanks, Åsa

brentp commented 5 years ago

For the first example, that was a mistake s16, s17 were the grandparents. I have updated the wiki.

slivar should support your use-case, but as you can tell there are some edge cases I may not have covered. I will fix this for the next release.

asapb commented 5 years ago

Thank you for the clarification about the grandparents and I look forward to the fix! It's much appreciated.

brentp commented 5 years ago

hi, I am working on this now. Can you make sure that all lines in your groups file have the same number of entries? Instead of:

#affected\tcarrier\thealthy
...
s6\ts7
s8
s9

you'd have:

#affected\tcarrier\thealthy
...
s6\ts7\t
s8\t\t
s9\t\t

where I've skipped the first 2 lines for brevity and written out the literal tabs. I'll work on making this better for next release, but I think this can resolve your issue in the interim.

brentp commented 5 years ago

So, I see the error you got when I add the tabs. I have a proper fix and I am testing now.

brentp commented 5 years ago

will you try this binary and verify that it solves your issue? slivar.gz

brentp commented 5 years ago

if this is not fixed by the attached binary, please re-open.

asapb commented 5 years ago

Hi Brent,

Didn't see your message until now.

For the record, I previously tried running both ways, with and without the tabs \t to indicate an empty position.

In any case, I will test the binary you sent now.

Thank you!