asntech / intervene

Intervene: a tool for intersection and visualization of multiple genomic region and gene sets
http://intervene.rtfd.io/
Other
132 stars 28 forks source link

Questions: Why half of pairwise is not a exact mirror of the other half? And about negative values.. #35

Open Rseq opened 4 years ago

Rseq commented 4 years ago

Good morning,

Many thanks for developing this amazing tool. I have a doubt about the pairwise mode that may seem naive, but I could not figure out. I notice in some of my dataset and also here in this example:

intervene pairwise -i ~/dbSUPER/mm9/*.bed --filenames --compute frac --htype color https://intervene.readthedocs.io/en/latest/_images/pairwise_color.png

In that example, half of the data is not mirrored on the other half. As the combination is the same, why the values are not mirrored? For example, "Bone_Marrow" row and "Spleen" column should not have the same value as "Spleen" row and "Bone_Marrow" column?

I assume that this is the reason why "tribar" mode is not recommended for "count" or "frac", but why this happen? Would you mind to clarify? And how to interpret this in the correct way?

If I plot correlation , let's say "pearson", this doesn't happen and I get mirrored values as I would expected. Would this be a solution? If so, what negative values are saying to me in this case? Would that be that instead of being negative correlate (variable A increases while B decreases or vice-versa) it would be close to zero (-1 = 0 overlaps)?

Thank you for your time

amizeranschi commented 3 years ago

Hi @Rseq

I'm not a developer of Intervene, just a regular user. However, I'm just as confused as you about the results that it produces. Have a look: https://github.com/asntech/intervene/issues/34

I am guessing that both of our problems are related to what is stated in an older comment here: https://github.com/asntech/intervene/issues/27#issuecomment-560549528.

It doesn't make much sense to me that the order of the files should give different results. Set intersection should be commutative, even when looking at overlaps across multiple (sets of) genomic regions. This also makes the results of Intervene to end up very different from those of other tools, as I've shown in https://github.com/asntech/intervene/issues/34.

Rseq commented 3 years ago

Thanks for sharing your doubts as well. I'm also tracking the #34 as I could not understand how exactly is working. Let's hope that the developers can clarify the points we made.

amizeranschi commented 3 years ago

Yes, I hope we'll get a reply from the developers.

asntech commented 3 years ago

Dear @Rseq @amizeranschi,

I apologize for the late response. For some reason, this slipped off my radar.

This is quite tricky when we plot Ven diagrams for genomic regions. As you will not have always a one-to-one overlap. For example:

(a + b + c) != (b + c + a)

This is explained well enough here by the pybedtools developer as posted by @amizeranschi #27 https://github.com/daler/pybedtools/issues/45#issuecomment-2543863

@Rseq you are right in your first comment. This is the reason why tribar mode is not recommended for count or frac as A interset B is not always equal to B intersect A for genomic sets.

I will push a new version soon with options to set u=True or Fase and v=True or False. But it makes sense to keep these set to True by default.

I hope this helps and thanks again for your interest!

Best, Aziz

Rseq commented 3 years ago

Many thanks for your reply, @asntech ! I believe my expectations would be more close to multiinter. But, it really makes sense these differences.

Although this part here is still not clear for me

If I plot correlation , let's say "pearson", this doesn't happen and I get mirrored values as I would expected. Would this be a solution? If so, what negative values are saying to me in this case? Would that be that instead of being negative correlate (variable A increases while B decreases or vice-versa) it would be close to zero (-1 = 0 overlaps)?

Would you mind explain it or point me towards an explanation?

Thank you for your time

amizeranschi commented 3 years ago

+1 to implementing bedtools multiinter (or bedops --intersect, which is equivalent) in Intervene, as an alternative option to the current approach.

This way, the intersection operation would become commutative, so the order of the input files won't matter and the pairwise plot would be symmetric.

Rohit-Satyam commented 3 years ago

Hi @asntech

I hope these warnings will go away too with the update you are planning

/home/rohit/miniconda3/envs/intervene/lib/python3.7/site-packages/intervene/modules/pairwise/pairwise.py:214: FutureWarning:
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated
  D = D.ix[cluster_order, cluster_order]
/home/rohit/miniconda3/envs/intervene/lib/python3.7/site-packages/intervene/modules/pairwise/pairwise.py:150: FutureWarning:
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated
  series = series.ix[order]
/home/rohit/miniconda3/envs/intervene/lib/python3.7/site-packages/pandas/plotting/_matplotlib/tools.py:307: MatplotlibDeprecationWarning:
The rowNum attribute was deprecated in Matplotlib 3.2 and will be removed two minor releases later. Use ax.get_subplotspec().rowspan.start instead.
  layout[ax.rowNum, ax.colNum] = ax.get_visible()
/home/rohit/miniconda3/envs/intervene/lib/python3.7/site-packages/pandas/plotting/_matplotlib/tools.py:307: MatplotlibDeprecationWarning:
The colNum attribute was deprecated in Matplotlib 3.2 and will be removed two minor releases later. Use ax.get_subplotspec().colspan.start instead.
  layout[ax.rowNum, ax.colNum] = ax.get_visible()
/home/rohit/miniconda3/envs/intervene/lib/python3.7/site-packages/pandas/plotting/_matplotlib/tools.py:313: MatplotlibDeprecationWarning:
The rowNum attribute was deprecated in Matplotlib 3.2 and will be removed two minor releases later. Use ax.get_subplotspec().rowspan.start instead.
  if not layout[ax.rowNum + 1, ax.colNum]:
/home/rohit/miniconda3/envs/intervene/lib/python3.7/site-packages/pandas/plotting/_matplotlib/tools.py:313: MatplotlibDeprecationWarning:
The colNum attribute was deprecated in Matplotlib 3.2 and will be removed two minor releases later. Use ax.get_subplotspec().colspan.start instead.
  if not layout[ax.rowNum + 1, ax.colNum]:

Also when I try plotting the dendogram it throws me the following error

 intervene pairwise --bedtools-options f=0.50 -i *.csv --htype dendrogram
Traceback (most recent call last):
  File "/home/rohit/miniconda3/envs/intervene/bin/intervene", line 606, in <module>
    main()
  File "/home/rohit/miniconda3/envs/intervene/bin/intervene", line 426, in main
    pairwise.pairwise_intersection(label_names, options)
  File "/home/rohit/miniconda3/envs/intervene/lib/python3.7/site-packages/intervene/modules/pairwise/pairwise.py", line 478, in pairwise_intersection
    heatmap_dendrogram(matrix,outfile, options)
  File "/home/rohit/miniconda3/envs/intervene/lib/python3.7/site-packages/intervene/modules/pairwise/pairwise.py", line 304, in heatmap_dendrogram
    sns.plt.setp(sns_plot.ax_heatmap.yaxis.get_majorticklabels(), rotation=0)
AttributeError: module 'seaborn' has no attribute 'plt'