charles-plessy / CAGEr

Mirror of Bioconductor's CAGEr package repository
https://bioconductor.org/packages/CAGEr
6 stars 4 forks source link

aggregateTagClusters give a different result on CAGEexp compared with formerly on CAGEset objects. #86

Open charles-plessy opened 1 year ago

charles-plessy commented 1 year ago

This is quite disturbing… Compare the two:

This was also mentioned in #40

charles-plessy commented 1 year ago

Using a fixed version of aggregateTagClusters (8d746fa2939ddcb5d589d5cc5813bd2af906aca8). I can reproduce the consensus cluster coordinates computed in CAGEr 1.20 (Bioc release 3.6), if I turn off the quantile options.

> aggregateTagClusters(ce, tpmThreshold = 5, qLow = NULL, qUp = NULL, maxDist = 100) |> consensusClustersGR() |> sort()
ConsensusClusters object with 284 ranges and 3 metadata columns:
                            seqnames            ranges strand |     score    dominant_ctss tpm.dominant_ctss
                               <Rle>         <IRanges>  <Rle> | <numeric>        <GRanges>         <numeric>
  chr17:26453632-26453737:+    chr17 26453632-26453737      + |   129.734 chr17:26453701:+           37.9264
  chr17:26564508-26564611:+    chr17 26564508-26564611      + |   231.065 chr17:26564585:+           65.7220
  chr17:26595637-26595805:+    chr17 26595637-26595805      + |   905.419 chr17:26595750:+          368.6590
  chr17:26596033-26596339:+    chr17 26596033-26596339      + |   141.826 chr17:26596198:+           55.5861
  chr17:26645155-26645516:+    chr17 26645155-26645516      + |  2608.956 chr17:26645160:+          317.3201
                        ...      ...               ...    ... .       ...              ...               ...
  chr17:45534699-45534730:-    chr17 45534699-45534730      - | 182.18189 chr17:45534727:-         146.95219
  chr17:45545895-45546033:-    chr17 45545895-45546033      - | 933.76230 chr17:45545991:-         277.01814
  chr17:45554296-45554345:-    chr17 45554296-45554345      - |  37.79642 chr17:45554339:-          17.51376
  chr17:45700092-45700187:-    chr17 45700092-45700187      - |   9.21467 chr17:45700182:-           9.21467
  chr17:45901695-45901784:-    chr17 45901695-45901784      - | 789.81397 chr17:45901749:-         283.80412
  -------
  seqinfo: 26 sequences (1 circular) from danRer7 genome

At the moment, I am inclined to think that CAGEr 1.20 did not take the qLow and qUp options into account for defining coordinates (in contrary to what the documentation states), and that this explains this part of the the discrepancy. Still, the scores do not match.

From the old CAGEr documentation:

consensusCl <- consensusClusters(myCAGEset)
> head(consensusCl)
consensus.cluster chr start end strand tpm
1 1 chr17 26453631 26453737 +  185.9678
2 2 chr17 26564507 26564611 +  318.3590
3 3 chr17 26595636 26595805 + 1017.8475
4 4 chr17 26596032 26596339 +  307.5127
5 5 chr17 26645154 26645516 + 2844.4560
6 6 chr17 26651962 26652050 +   71.0783