Open lovelycatZ opened 1 month ago
I want to add that there's also difference in results depending on what bed file you put first.
For example:
With intervene venn -i Aaa.bed Bbb.bed
I get 1930 intersecting intervals
With intervene venn -i Bbb.bed Aaa.bed
I get 1923 intersecting intervals
And bedtools intersect -a Aaa.bed -b Bbb.bed | wc -l
gives me 1947 intervals
Is there anybody can answer this question?
Hi @lovelycatZ @Sarsaparella thanks for your interest and posting this.
This is a known issue when doing genomic interval intersections. This is because, for genomic regions, you will not always have a one-to-one overlap. That's why you slight difference when changing the file order: -i Aaa.bed Bbb.bed != -i Aaa.bed Bbb.bed
. For example, the first combination prints how many intervals in A overlap with those in B.
For example, if you've three BED files (a, b and c) and when you run the pybedtools
:
(a + b + c) != (b + c + a)
This is also explained well enough here by the pybedtools developer as posted by @amizeranschi https://github.com/asntech/intervene/issues/27 https://github.com/daler/pybedtools/issues/45#issuecomment-2543863
Also, Intervene is using pybedtools
in the background with u=True
and v=True
. You can run it with --bedtools-options u=False
I hope this helps.
Hi @asntech
Please consider implementing bedtools multiinter
as an alternative to bedtools intersect
, which is not very intuitive to interpret and it leads to people coming here with questions like the one above.
Implementing bedtools multiinter
would make the intersections commutative, which means that the order of the sets would not matter, so (a + b + c) == (b + c + a)
. This would also make the Venn and UpSet diagrams more intuitive to interpret.
You would just need to use this function from pybedtools
:
https://daler.github.io/pybedtools/autodocs/pybedtools.bedtool.BedTool.multi_intersect.html
which is a wrapper for bedtools multiinter
:
https://bedtools.readthedocs.io/en/latest/content/tools/multiinter.html
Hello,
Thank you for providing such a convenient tool!
I have installed Intervene via conda and tested with my own dataset
hap1.bed
andhap2.bed
to generate a Venn diagram using command lineintervene venn -i hap*.bed --names hap1,hap2 --save-overlaps
. Finally, in the Intervene_result folder, there were three files 1)01_hap2.bed; 2)10_hap1.bed; 3)11_hap1_hap2.bed. For the third file 11_hap1_hap2.bed, it contained 5206 intervals same with the intersection part of the Venn diagram that were found in both A and B, right? But it was strange that when I use BEDtools with command linebedtools intersect -a hap1.bed -b hap2.bed
, there were 5773 records in the result file.How to explain the the difference?