brentp / slivar

genetic variant expressions, annotation, and filtering for great good.
MIT License
248 stars 23 forks source link

Merging vcfs to use with slivar #84

Closed sprakashUTH closed 3 years ago

sprakashUTH commented 3 years ago

I used bcftools merge --missing-to-ref to combine three vcfs with overlapping pedigrees prior to slivar analysis, but slivar aborts with thousands of errors. Is this because the allelic depths are absent? Is there any workaround short of joint recalling from the bams?

brentp commented 3 years ago

what is the error? what is the command that you ran?

sprakashUTH commented 3 years ago

I had three vcf files and I ran bcftools merge setting missing values to reference. I then annotated with snpEff, decomposed and normalized as you recommend. The command I ran was: ./slivar expr --family-expr "aff_only:fam.every(function(s) { s.het == s.affected && s.hom_ref == !s.affected && s.GQ > 5 })" --vcf $vcf --ped $ped --pass-only -g $gnomad --info 'INFO.impactful && INFO.gnomad_popmax_af < 0.01 && variant.FILTER == "PASS" && variant.ALT[0] != "*"' --js slivar-functions.js | bcftools csq -s - --ncsq 40 -g $gff -l -f $fasta - -o $cohort.vcf but the command doesn't really matter. Any command I choose runs without errors but outputs no variants. Do I need to specify any rules for the INFO field when merging? Or can I not use slivar with merged vcf files at all? I am jointly recalling the bams but that will take me some time. I had hoped to use merged vcfs as a quick and dirty workaround.

brentp commented 3 years ago

In your the expression does matter. It's not correct, you need to add a return statement. So:

"aff_only:fam.every(function(s) { return s.het == s.affected && s.hom_ref == !s.affected && s.GQ > 5 })"

if you have seen it without the return in the docs somewhere, please let me know and I will fix it.

brentp commented 3 years ago

I have just seen this in the readme. I updated it to include the return. Thanks for reporting.

sprakashUTH commented 3 years ago

When I enter that command in bash I receive the error: -bash: !s.affected: event not found. Should I escape or quote to avoid this?

brentp commented 3 years ago

use single-quotes instead of double. I'll fix this in the readme as well.

sprakashUTH commented 3 years ago

Thanks! It works. How do I restrict to rare variants in the same gene in at least three different families, loss of function variants only, or genes with at least three different rare variants? How do I filter by cutoff CADD score (variants with CADD > 25)?

brentp commented 3 years ago

for cadd, given a field named "cadd_score", you can do, for example INFO.cadd_score > 25. to get the same gene in 3 different families, you'd have to write a post-hoc filtering script as that's not supported in slivar directly.

sprakashUTH commented 3 years ago

I have many families but very few trios. Can you please provide a template script to extract variants from a single family and to assess for transmission (parent-child or sib-sib) within that family? I plan to iterate across families and select recurrent variants that meet filtering criteria.

brentp commented 3 years ago

Hi the wiki page here has the setup for families (not strictly trios).

sprakashUTH commented 3 years ago

Yes, as I mentioned before the rare variant setup does not work for me. Here is what I entered and the results. Everything except X chromosome variants is 0:

fasta="$WORK/references/hs37d5.fa" cohort="$WORK/uwbatches2-4/rare.variants" ped="$WORK/uwbatches2-4/all.ped" vcf="$WORK/uwbatches2-4/all.snpEff.vcf.gz" gnomad="gnomad.hg37.zip" gff="Homo_sapiens.GRCh37.82.gff3"

vlogin01.ls5(1099)$ ./slivar expr --vcf $vcf --ped $ped --pass-only -g $gnomad --info 'INFO.impactful && INFO.gnomad_popmax_af < 0.01 && variant.FILTER == "PASS" && variant.ALT[0] != "*"' --js slivar-functions.js \

--family-expr 'denovo:fam.every(segregating_denovo) && INFO.gnomad_popmax_af < 0.001' \
--family-expr 'recessive:fam.every(segregating_recessive)' \
--family-expr 'x_denovo:(variant.CHROM == "X" || variant.CHROM == "chrX") && fam.every(segregating_denovo_x) && INFO.gnomad_popmax_af < 0.001' \
--family-expr 'x_recessive:(variant.CHROM == "X" || variant.CHROM == "chrX") && fam.every(segregating_recessive_x)' \
--trio 'comphet_side:comphet_side(kid, mom, dad) && INFO.gnomad_nhomalt < 10' \
| bcftools csq -s - --ncsq 40 -g $gff -l -f $fasta - -o $cohort.vcf

slivar version: 0.2.1 ceb97b26cd39d341dd7aa96ddb42239692df5b50 [slivar] 256 samples matched in VCF and PED to be evaluated [slivar] message for gnomad.hg37.zip: gnomad hg37 v2.1 [slivar] evaluating on 42 trios Parsing Homo_sapiens.GRCh37.82.gff3 ... Indexed 196476 transcripts, 1195710 exons, 724218 CDSs, 280277 UTRs Ignored the following biotypes: 46x .. 3prime_overlapping_ncrna Calling... [slivar] 10000 1:26228207 evaluated 10000 variants in 7.4 seconds (1353.1/second) [slivar] 20000 1:89449411 evaluated 10000 variants in 4.6 seconds (2178.8/second) [slivar] 100000 3:195506529 evaluated 100000 variants in 40.8 seconds (2453.1/second)

error: unknown impact "non_canonical_start_codon" from csq "G|initiator_codon_variant&non_canonical_start_codon|LOW|C5orf60|ENSG00000204661|transcript|ENST00000513845|nonsense_mediated_decay|1/4|c.1T>C|p.Leu1?|3/771|1/243|1/80||" please report the variant at https://github.com/brentp/slivar/issues [slivar] 200000 8:144460568 evaluated 100000 variants in 52.7 seconds (1896.4/second)

error: unknown impact "non_canonical_start_codon" from csq "C|initiator_codon_variant&non_canonical_start_codon|LOW|VPS13A|ENSG00000197969|transcript|ENST00000467124|protein_coding|1/3|c.1T>C|p.Leu1?|3/324|1/123|1/40||" please report the variant at https://github.com/brentp/slivar/issues [slivar] 300000 14:22591987 evaluated 100000 variants in 53.3 seconds (1877.4/second) [slivar] 400000 19:34291393 evaluated 100000 variants in 50.2 seconds (1991.3/second) [slivar] Finished. evaluated 456580 total variants and wrote 470 variants that passed your slivar expressions. sample comphet_side denovo recessive x_denovo x_recessive 303134 0 0 0 0 3 303135 0 0 0 0 0 303136 0 0 0 0 0 303137 0 0 0 0 0 303138 0 0 0 0 10 303139 0 0 0 0 0 303140 0 0 0 0 0 303141 0 0 0 0 0 303142 0 0 0 0 0 303143 0 0 0 0 0 303144 0 0 0 2 3 303145 0 0 0 0 0 303146 0 0 0 0 0 303147 0 0 0 0 13 303148 0 0 0 0 0 303149 0 0 0 0 0 303150 0 0 0 0 0 303151 0 0 0 0 0 303152 0 0 0 0 0 303153 0 0 0 0 9 303154 0 0 0 0 10 303155 0 0 0 0 0 303156 0 0 0 0 0 303157 0 0 0 2 4 303158 0 0 0 0 5 303159 0 0 0 0 8 303160 0 0 0 0 0 303162 0 0 0 1 2 303163 0 0 0 0 10 303164 0 0 0 0 12 303166 0 0 0 0 0 303167 0 0 0 0 11 303168 0 0 0 0 0 303172 0 0 0 0 13 303174 0 0 0 0 8 303175 0 0 0 0 14 303176 0 0 0 0 0 303177 0 0 0 0 1 303178 0 0 0 0 0 303179 0 0 0 0 0 303180 0 0 0 0 1 303181 0 0 0 0 8 303182 0 0 0 0 0 303183 0 0 0 0 0 303184 0 0 0 0 0 303186 0 0 0 0 0 303188 0 0 0 0 0 303189 0 0 0 0 2 303190 0 0 0 0 0 303191 0 0 0 0 0 303192 0 0 0 0 0 303193 0 0 0 0 6 303194 0 0 0 0 0 303195 0 0 0 0 9 303196 0 0 0 0 0 303197 0 0 0 0 0 303198 0 0 0 0 10 303199 0 0 0 0 0 303200 0 0 0 0 0 303202 0 0 0 0 9 303203 0 0 0 0 13 303204 0 0 0 0 0 303205 0 0 0 1 0 303206 0 0 0 0 0 303207 0 0 0 0 12 303208 0 0 0 0 0 303209 0 0 0 0 0 303210 0 0 0 0 0 303211 0 0 0 0 17 303212 0 0 0 0 7 303213 0 0 0 0 10 303214 0 0 0 0 11 303215 0 0 0 0 10 303216 0 0 0 0 0 303217 0 0 0 0 11 303218 0 0 0 0 19 303219 0 0 0 0 17 303220 0 0 0 0 1 303221 0 0 0 0 0 303222 0 0 0 0 0 303223 0 0 0 0 0 303224 0 0 0 0 0 303225 0 0 0 0 13 303226 0 0 0 0 0 303227 0 0 0 0 0 303228 0 0 0 0 13 303229 0 0 0 0 13 303230 0 0 0 0 0 303231 0 0 0 0 9 303232 0 0 0 0 3 303233 0 0 0 0 8 303234 0 0 0 0 0 303235 0 0 0 0 0 304506 0 0 0 0 0 304508 0 0 0 0 0 304509 0 0 0 0 4 304511 0 0 0 0 13 304512 0 0 0 0 2 304513 0 0 0 0 7 346961 0 0 0 0 0 346962 0 0 0 0 17 346963 0 0 0 0 0 346964 0 0 0 0 0 346965 0 0 0 0 0 346966 0 0 0 0 0 346967 0 0 0 0 16 346968 0 0 0 0 16 346969 0 0 0 0 13 346970 0 0 0 0 17 346971 0 0 0 0 13 346972 0 0 0 0 10 346973 0 0 0 0 13 346974 0 0 0 0 0 346975 0 0 0 0 0 346976 0 0 0 0 21 346977 0 0 0 1 8 346978 0 0 0 0 0 346979 0 0 0 0 0 346980 0 0 0 0 0 346981 0 0 0 0 0 346982 0 0 0 0 0 346983 0 0 0 0 0 346984 0 0 0 0 0 346985 0 0 0 0 14 346986 0 0 0 0 19 346987 0 0 0 0 5 346988 0 0 0 0 5 346989 0 0 0 0 5 346990 0 0 0 0 21 346991 0 0 0 0 12 346992 0 0 0 0 12 346993 0 0 0 0 1 346994 0 0 0 0 0 346995 0 0 0 0 0 346996 0 0 0 0 0 346997 0 0 0 0 0 346998 0 0 0 2 7 346999 0 0 0 0 0 347000 0 0 0 0 0 347001 0 0 0 1 0 347002 0 0 0 0 0 347003 0 0 0 0 0 347004 0 0 0 0 10 347005 0 0 0 0 2 347006 0 0 0 0 0

brentp commented 3 years ago

were the samples jointly-called? with which caller? can you show the lines of ped file for one of your families?

sprakashUTH commented 3 years ago

The samples were merged from three vcf files using bcftools: One family: BAV086 303216 0 556372 2 2 BAV086 556372 0 0 2 1 BAV086 556373 0 556372 1 1

brentp commented 3 years ago

ok. i think that's the problem. if there are no hom-ref calls then slivar won't work (and neither will most calls).

sprakashUTH commented 3 years ago

I set missing to ref when merging. Can I deal with this issue another way beside joint calling?

brentp commented 3 years ago

I guess you could remove the AB checks from slivar-functions.js or you could inject AB of 0.5 for hets, 0 for hom-ref and so on into the VCF.