Genomicus / bedtools

Automatically exported from code.google.com/p/bedtools
0 stars 0 forks source link

intersectBed of VCF files with minimum overlap and reciprocal #109

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Create a VCF file a.vcf with this line:

20      1711570 .       T       TG      59      PASS    
DP=173;NF=0;NR=0;NRS=5;NFS=3;HP=1       GT:GQ   0/1:20

2. Create a VCF file b.vcf with this line:

20      1711570 .       T       TGAGGGTCGTCGCTAGTTATTTTACCGT    .       PASS    
RegionSet=DEFAULT;LINE=28528,COL=10     .       .

3. intersectBed -a a.vcf -b b.vcf -f 1.00 -r

What is the expected output? What do you see instead?

I am expecting to have no result. What I am looking for is a solution that will 
return the exact intersection between a.vcf and b.vcf.
Instead I see that the feature of a.vcf is returned:
20      1711570 .       T       TG      59      PASS    
DP=173;NF=0;NR=0;NRS=5;NFS=3;HP=1       GT:GQ   0/1:20

What version of the product are you using? On what operating system?

I am using intersectBed v2.14.3 on the OS "Linux maxwell-MacPro 
2.6.35-25-generic #44-Ubuntu SMP".

Please provide any additional information below.

I may be wrong in interpreting the manual. If this is the case please let me 
know a workaround to obtain intersection.

Original issue reported on code.google.com by pascal.m...@gmail.com on 24 Jan 2012 at 1:36

GoogleCodeExporter commented 9 years ago
I'm wondering if my VCF files are not considered as BED files ?!

Just to complete the description above I included the VCF header in both files 
a.vcf and b.vcf but it does not solve my problem:

a.vcf:
##fileformat=VCFv4.0
##source=Dindel
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SAMPLE
20      1711570 .       T       TG      59      PASS    
DP=173;NF=0;NR=0;NRS=5;NFS=3;HP=1       GT:GQ   0/1:20

b.vcf
##fileformat=VCFv4.0
##source=Dindel
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SAMPLE
20      1711570 .       T       TGAGGGTCGTCGCTAGTTATTTTACCGT    .       PASS    
RegionSet=DEFAULT;LINE=28528,COL=10     .       .

$ intersectBed -a a.vcf -b b.vcf -f 1.00 -r
20      1711570 .       T       TG      59      PASS    
DP=173;NF=0;NR=0;NRS=5;NFS=3;HP=1       GT:GQ   0/1:20

Original comment by pascal.m...@gmail.com on 24 Jan 2012 at 2:02

GoogleCodeExporter commented 9 years ago
Can you elaborate on why you think these should _not_ overlap?  The position of 
each variant is identical in the reference genome.  Given that the reference is 
the coordinate system (not the experimental genomes, these variants overlap by 
definition in the context of bedtools.  If you want more subtle comparisons 
that account for the fact that the alternate alleles are different, you may 
want to try vcftools.

Original comment by aaronqui...@gmail.com on 25 Jan 2012 at 1:03

GoogleCodeExporter commented 9 years ago

Original comment by aaronqui...@gmail.com on 13 Feb 2012 at 1:07