Open Rseq opened 4 years ago
Hi @Rseq
I'm not a developer of Intervene, just a regular user. However, I'm just as confused as you about the results that it produces. Have a look: https://github.com/asntech/intervene/issues/34
I am guessing that both of our problems are related to what is stated in an older comment here: https://github.com/asntech/intervene/issues/27#issuecomment-560549528.
It doesn't make much sense to me that the order of the files should give different results. Set intersection should be commutative, even when looking at overlaps across multiple (sets of) genomic regions. This also makes the results of Intervene to end up very different from those of other tools, as I've shown in https://github.com/asntech/intervene/issues/34.
Thanks for sharing your doubts as well. I'm also tracking the #34 as I could not understand how exactly is working. Let's hope that the developers can clarify the points we made.
Yes, I hope we'll get a reply from the developers.
Dear @Rseq @amizeranschi,
I apologize for the late response. For some reason, this slipped off my radar.
This is quite tricky when we plot Ven diagrams for genomic regions. As you will not have always a one-to-one overlap. For example:
(a + b + c) != (b + c + a)
This is explained well enough here by the pybedtools developer as posted by @amizeranschi #27 https://github.com/daler/pybedtools/issues/45#issuecomment-2543863
@Rseq you are right in your first comment. This is the reason why tribar
mode is not recommended for count
or frac
as A interset B is not always equal to B intersect A for genomic sets.
I will push a new version soon with options to set u=True or Fase
and v=True or False
. But it makes sense to keep these set to True by default.
I hope this helps and thanks again for your interest!
Best, Aziz
Many thanks for your reply, @asntech ! I believe my expectations would be more close to multiinter. But, it really makes sense these differences.
Although this part here is still not clear for me
If I plot correlation , let's say "pearson", this doesn't happen and I get mirrored values as I would expected. Would this be a solution? If so, what negative values are saying to me in this case? Would that be that instead of being negative correlate (variable A increases while B decreases or vice-versa) it would be close to zero (-1 = 0 overlaps)?
Would you mind explain it or point me towards an explanation?
Thank you for your time
+1 to implementing bedtools multiinter
(or bedops --intersect
, which is equivalent) in Intervene, as an alternative option to the current approach.
This way, the intersection operation would become commutative, so the order of the input files won't matter and the pairwise plot would be symmetric.
Hi @asntech
I hope these warnings will go away too with the update you are planning
/home/rohit/miniconda3/envs/intervene/lib/python3.7/site-packages/intervene/modules/pairwise/pairwise.py:214: FutureWarning:
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing
See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated
D = D.ix[cluster_order, cluster_order]
/home/rohit/miniconda3/envs/intervene/lib/python3.7/site-packages/intervene/modules/pairwise/pairwise.py:150: FutureWarning:
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing
See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated
series = series.ix[order]
/home/rohit/miniconda3/envs/intervene/lib/python3.7/site-packages/pandas/plotting/_matplotlib/tools.py:307: MatplotlibDeprecationWarning:
The rowNum attribute was deprecated in Matplotlib 3.2 and will be removed two minor releases later. Use ax.get_subplotspec().rowspan.start instead.
layout[ax.rowNum, ax.colNum] = ax.get_visible()
/home/rohit/miniconda3/envs/intervene/lib/python3.7/site-packages/pandas/plotting/_matplotlib/tools.py:307: MatplotlibDeprecationWarning:
The colNum attribute was deprecated in Matplotlib 3.2 and will be removed two minor releases later. Use ax.get_subplotspec().colspan.start instead.
layout[ax.rowNum, ax.colNum] = ax.get_visible()
/home/rohit/miniconda3/envs/intervene/lib/python3.7/site-packages/pandas/plotting/_matplotlib/tools.py:313: MatplotlibDeprecationWarning:
The rowNum attribute was deprecated in Matplotlib 3.2 and will be removed two minor releases later. Use ax.get_subplotspec().rowspan.start instead.
if not layout[ax.rowNum + 1, ax.colNum]:
/home/rohit/miniconda3/envs/intervene/lib/python3.7/site-packages/pandas/plotting/_matplotlib/tools.py:313: MatplotlibDeprecationWarning:
The colNum attribute was deprecated in Matplotlib 3.2 and will be removed two minor releases later. Use ax.get_subplotspec().colspan.start instead.
if not layout[ax.rowNum + 1, ax.colNum]:
Also when I try plotting the dendogram it throws me the following error
intervene pairwise --bedtools-options f=0.50 -i *.csv --htype dendrogram
Traceback (most recent call last):
File "/home/rohit/miniconda3/envs/intervene/bin/intervene", line 606, in <module>
main()
File "/home/rohit/miniconda3/envs/intervene/bin/intervene", line 426, in main
pairwise.pairwise_intersection(label_names, options)
File "/home/rohit/miniconda3/envs/intervene/lib/python3.7/site-packages/intervene/modules/pairwise/pairwise.py", line 478, in pairwise_intersection
heatmap_dendrogram(matrix,outfile, options)
File "/home/rohit/miniconda3/envs/intervene/lib/python3.7/site-packages/intervene/modules/pairwise/pairwise.py", line 304, in heatmap_dendrogram
sns.plt.setp(sns_plot.ax_heatmap.yaxis.get_majorticklabels(), rotation=0)
AttributeError: module 'seaborn' has no attribute 'plt'
Good morning,
Many thanks for developing this amazing tool. I have a doubt about the pairwise mode that may seem naive, but I could not figure out. I notice in some of my dataset and also here in this example:
intervene pairwise -i ~/dbSUPER/mm9/*.bed --filenames --compute frac --htype color
https://intervene.readthedocs.io/en/latest/_images/pairwise_color.pngIn that example, half of the data is not mirrored on the other half. As the combination is the same, why the values are not mirrored? For example, "Bone_Marrow" row and "Spleen" column should not have the same value as "Spleen" row and "Bone_Marrow" column?
I assume that this is the reason why "tribar" mode is not recommended for "count" or "frac", but why this happen? Would you mind to clarify? And how to interpret this in the correct way?
If I plot correlation , let's say "pearson", this doesn't happen and I get mirrored values as I would expected. Would this be a solution? If so, what negative values are saying to me in this case? Would that be that instead of being negative correlate (variable A increases while B decreases or vice-versa) it would be close to zero (-1 = 0 overlaps)?
Thank you for your time