I'm trying to annotate a vcf with a number of bedfiles. Not hard but it would be good to confirm the strategy with bedtools.
Objective
a) The annotations are sorted but NOT collapsed/merged to avoid annotating positions with regions that don't overlap the position
b) The output should have the same number of lines and order as the original vcf
c) Multiple hits for a position should have distinct values collapsed
d) Only chr, start,stop,annocol1,annocolx are required in the output. i.e. no score,name in -a and no chr, start, stop in -b
Challenge
1) intersect results in multiple hits/rows for a single position in the vcf when the annotations aren't merged. Seems like the easiest thing to do is do a merge on the output, right?
2) Intersect on a bed3 (just the index) and bed4 results in a ragged output (8 columns where there is no match and 7 where there is). Is there a way to avoid defaulting to a bed5 internally or should I add dummy columns to a bed3? See example.
3) merge requires enumeration of all the columns but these are dynamically changing. Any way to default to all columns if -o is set?
I'm trying to annotate a vcf with a number of bedfiles. Not hard but it would be good to confirm the strategy with bedtools.
Objective a) The annotations are sorted but NOT collapsed/merged to avoid annotating positions with regions that don't overlap the position b) The output should have the same number of lines and order as the original vcf c) Multiple hits for a position should have
distinct
values collapsed d) Only chr, start,stop,annocol1,annocolx are required in the output. i.e. no score,name in-a
and no chr, start, stop in-b
Challenge 1)
intersect
results in multiple hits/rows for a single position in the vcf when the annotations aren't merged. Seems like the easiest thing to do is do amerge
on the output, right? 2)Intersect
on a bed3 (just the index) and bed4 results in a ragged output (8 columns where there is no match and 7 where there is). Is there a way to avoid defaulting to a bed5 internally or should I add dummy columns to a bed3? See example. 3)merge
requires enumeration of all the columns but these are dynamically changing. Any way to default to all columns if-o
is set?Strategy tmp1.bed tmp2.bed