Closed justme66 closed 5 years ago
Yes you are correct, the command you provided in your question is useful if you want to combine features from the same SV in multiple people without accessing the BAM files
Yes, this is how I typically run SV2. For a given family, I combine all the SV calls, I then pick the highest scoring SV for overlapping calls
I think I know what is causing that error, but I want to make sure. Is the position for that SV correct? It looks like it's 11bp not 150kb. Can you please tell me the exact position of your de novo and the reference build so I can debug this?
merging on reciprocal overlap of SVs uses the regions, so chr1:100-200 and chr1:80-200 would be merged into the SV region with the highest median ALT genotype likelihood. To be honest, merging SVs is an area of debate, I think it makes more sense to use CIPOS and CIEND to merge SVs.
I have a feeling that if you run feature extraction (pass only -pre sv2_preprocessing/
) on the parents, this error might resolve.
Thanks for your reply.
Thanks!
What version of python are you running? SV2 requires python 2.7
I ask because I cannot replicate your error. You can try changing that line to
if (locus[0]=='chrX' or locus[0]=='chrY') and Ped.males.get(sample_id)!=None and Structural_Variant.par.get(locus)==False: gt='.'
and I think that should fix the Key Error.
However, SV2 first parses all SV input you provide, and for variants on the chrX and chrY chromosomes, it loads that Structural_Variant.par
dict with either True or False entries (par is the pseudoautosomal regions)
So that means, any chrX or chrY variants you provide as input are in that dict.
Here's my test on a male individual
test.bed
chrX 137012104 137012137 DEL
chrX 145935451 145935454 DEL
command:
sv2 -i NA19239.alt_bwamem_GRCh38DH.20150715.YRI.high_coverage_markedDupsRemoved.bam -b test.bed -p test.ped -dels-only -g hg38 -pre prepro/
STDOUT:
sv2 version:1.4.4 report bugs to <dantaki at ucsd dot edu> error messages located in STDOUT
************************************************
PREPROCESSING COMPLETE <time elapsed: 0:00:01>
************************************************
*****************************************************
FEATURE EXTRACTION COMPLETE <time elapsed: 0:00:04>
*****************************************************
*********************************************
GENOTYPING COMPLETE <time elapsed: 0:00:35>
*********************************************
VCF output
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA19239
chrX 137012105 . . <DEL> 19 STR;FAIL END=137012137;SVTYPE=DEL;SVLEN=33;DENOVO_FILTER=FAIL;REF_GTL=NA;AF=1.0;CYTOBAND=Xq26.3;REPEATMASKER=NA,0;1000G_ID=NA;1000G_OVERLAP=NA;DESCRIPTION=chrX:137012105-137012137_deletion;GENES=intergenic;ABPARTS=0;CENTROMERE=0;GAP=0;SEGDUP=0;STR=0.97;UNMAPPABLE=0 GT:CN:PE:SR:SC:NS:HA:NH:SQ:GL 1:0.00:0.00:0.00:nan:0:nan:0:19:0.01,0.99
chrX 145935452 . . <DEL> 19 FAIL END=145935454;SVTYPE=DEL;SVLEN=3;DENOVO_FILTER=FAIL;REF_GTL=NA;AF=1.0;CYTOBAND=Xq27.3;REPEATMASKER=NA,0;1000G_ID=NA;1000G_OVERLAP=NA;DESCRIPTION=chrX:145935452-145935454_deletion;GENES=intergenic;ABPARTS=0;CENTROMERE=0;GAP=0;SEGDUP=0;STR=0;UNMAPPABLE=0 GT:CN:PE:SR:SC:NS:HA:NH:SQ:GL 1:0.00:0.00:0.00:nan:0:nan:0:19:0.01,0.99
paste in the entire SV2 command and your version of python and SV2. If your version is not version 1.4.4 I would uninstall sv2 and reinstall it with pip
Also SV2 is designed for variants >50bp and performs best for variants >150bp. I would remove those small variants since they are really INDELs (2-49bp). Also if a variant completely overlaps a short tandem repeat (STR), segmental duplication, and/or assembly gaps, I would remove it because SV2 won't genotype them anyway.
Thanks a lot, Dan. The version of python and SV2 are 2.7 and 1.4.3.4.
Try to update SV2 and see if that resolves the error
Hi, may i ask how to "force SV2 to genotype specific regions in parents after I extracted features and genotyped SVs for one sample at a time"? Thanks for your help.
You can provide the regions in BED format but you will need to extract features and genotype those new regions.
Hi Dan,
I used SV2 on a trio. When I genotyped father/mother/child samples one by one. I combined them as the genotype matrix, followed by stringent filter (de novo filter) and parse out de novo deletions and duplications. I knew in advance that there is a ~150kb de novo in the patient. But in the genotype matrix, this SV is genotyped as 0/1(child) ./.(father) ./.(mother). I checked the features txt file for each sample. On such ~150kb region, the child was of course extracted informative features, but the parents were operated smaller variants (<100bp), which make it be ./. I guess. So my questions are:
Thanks!