ComputationalRegulatoryGenomicsICL / GenomicInteractions

R/Bioconductor package for handling Genomic interaction data, annotating genomic features with interaction information and producing summary plots / statistics
7 stars 1 forks source link

annotateInteraction #12

Open alkurowska opened 2 years ago

alkurowska commented 2 years ago

Hello,

I have defined annotation features - promoter and enhancer lists. I used them with Hi-C data trying to find enhancer-promoter pairs. My Hi-C data ranges were 10kb in size. Later I repeated the analysis using same feateures, as well as same Hi-C data, but I have enlarged the Hi-C regions from 10 kb to 25kb. Suprisingly some of the enhancer-promoter pairs that were found in 10 kb ranges were not found in 25 kb ranges, even tho it is exact the same data.

Any idea what could be the problem?

alkurowska commented 2 years ago

This is my data with 10kb:

GenomicInteractions object with 53047 interactions and 3 metadata columns:
          seqnames1             ranges1     seqnames2             ranges2 |    counts     p.value         fdr
              <Rle>           <IRanges>         <Rle>           <IRanges> | <integer>   <numeric>   <numeric>
      [1]     chr10 100340000-100350000 ---     chr10 100420000-100430000 |        26 5.44410e-10 2.10322e-05
      [2]     chr10 100350000-100360000 ---     chr10 100430000-100440000 |        23 4.64182e-08 9.62884e-04
      [3]     chr10 100560000-100570000 ---     chr10 100870000-100880000 |        16 1.39535e-10 6.22923e-06
      [4]     chr10 100670000-100680000 ---     chr10 101210000-101220000 |        12 3.32377e-09 1.00389e-04
      [5]     chr10 100680000-100690000 ---     chr10 100790000-100800000 |        26 1.55326e-12 1.17148e-07
      ...       ...                 ... ...       ...                 ... .       ...         ...         ...
  [53043]      chrY   10760000-10770000 ---      chrY   56720000-56730000 |         6 2.22357e-12 1.64184e-07
  [53044]      chrY   10780000-10790000 ---      chrY   56720000-56730000 |         5 3.88835e-10 1.53275e-05
  [53045]      chrY   10790000-10800000 ---      chrY   26670000-26680000 |        10 7.36211e-20 3.23411e-14
  [53046]      chrY   11530000-11540000 ---      chrY   56720000-56730000 |        11 2.07792e-24 2.31984e-18
  [53047]      chrY   26670000-26680000 ---      chrY   56720000-56730000 |         7 3.52072e-14 4.25279e-09
  -------
  regions: 41067 ranges and 3 metadata columns
  seqinfo: 24 sequences from an unspecified genome; no seqlengths

This is my data with 25 kb:

GenomicInteractions object with 53047 interactions and 3 metadata columns:
          seqnames1             ranges1     seqnames2             ranges2 |    counts     p.value         fdr
              <Rle>           <IRanges>         <Rle>           <IRanges> | <integer>   <numeric>   <numeric>
      [1]     chr10 100332500-100357500 ---     chr10 100412500-100437500 |        26 5.44410e-10 2.10322e-05
      [2]     chr10 100342500-100367500 ---     chr10 100422500-100447500 |        23 4.64182e-08 9.62884e-04
      [3]     chr10 100552500-100577500 ---     chr10 100862500-100887500 |        16 1.39535e-10 6.22923e-06
      [4]     chr10 100662500-100687500 ---     chr10 101202500-101227500 |        12 3.32377e-09 1.00389e-04
      [5]     chr10 100672500-100697500 ---     chr10 100782500-100807500 |        26 1.55326e-12 1.17148e-07
      ...       ...                 ... ...       ...                 ... .       ...         ...         ...
  [53043]      chrY   10752500-10777500 ---      chrY   56712500-56737500 |         6 2.22357e-12 1.64184e-07
  [53044]      chrY   10772500-10797500 ---      chrY   56712500-56737500 |         5 3.88835e-10 1.53275e-05
  [53045]      chrY   10782500-10807500 ---      chrY   26662500-26687500 |        10 7.36211e-20 3.23411e-14
  [53046]      chrY   11522500-11547500 ---      chrY   56712500-56737500 |        11 2.07792e-24 2.31984e-18
  [53047]      chrY   26662500-26687500 ---      chrY   56712500-56737500 |         7 3.52072e-14 4.25279e-09
  -------
  regions: 41067 ranges and 3 metadata columns
  seqinfo: 24 sequences from an unspecified genome; no seqlengths

So whatever overlaps within 10kb, has to overlap within 25 kb. To investiage what is happening I made an annotation of one genes that was found in 10kb, but didnt appear in 25kb. I used the gene and its paired enhancers as annotation features. And both the gene and enhacers as found in both 10 kb and 25kb.

Why does the gene disappear for annotation using full promoter and enhancer lists?

alkurowska commented 2 years ago

As I understand, the node.class is assigned according to whatever is first to overlap, but all the overlaps should be included in the results

liz-is commented 2 years ago

Hi, sorry for the delayed response. How are you extracting the enhancer-promoter pairs? As you noted, the node class is assigned based on your ordering of annotations, so the node class may well be different at different resolutions.

You might find the linkOverlaps function from the InteractionSet package helpful - you can use this to find any interactions in a GenomicInteractions object which link a promoter to an enhancer. This can be more flexible as it doesn't have the limitation of assigning a single node.class.