charles-plessy / CAGEr

Mirror of Bioconductor's CAGEr package repository
https://bioconductor.org/packages/CAGEr
6 stars 4 forks source link

paraclu clusterCTSS method does not return score column #73

Closed ferenckata closed 4 months ago

ferenckata commented 1 year ago

Hi,

I was running this function (basically the default settings for "paraclu"):

_brcage <- clusterCTSS(
    brcage,
    threshold = 1,
    nrPassThreshold = 1,
    thresholdIsTpm = TRUE,
    method = "paraclu",
    removeSingletons = FALSE,
    keepSingletonsAbove = Inf,
    minStability = 1,
    maxLength = 500,
    reduceToNonoverlapping = TRUE,
    useMulticore = TRUE,
    nrCores = numCore)_

and the output looks like this:

_tagClustersGR(
+   brcage,
+   sample = "sample1")
TagClusters object with 42762 ranges and 7 metadata columns:
                  seqnames        ranges strand |   cluster   nr_ctss
                     <Rle>     <IRanges>  <Rle> | <integer> <numeric>
      [1]             chr1 633952-633960      + |         1         2
      [2]             chr1 691179-691180      + |         2         1
      [3]             chr1 778797-779063      + |         3         7
      [4]             chr1 827658-827676      + |         4         6
      [5]             chr1 904811-904812      + |         5         1
      ...              ...           ...    ... .       ...       ...
  [42758] chrUn_KI270751v1 143288-143289      - |     42758         1
  [42759] chrUn_KI270754v1   18106-18107      + |     42759         1
  [42760] chrUn_KI270754v1     4035-4036      - |     42760         1
  [42761] chrUn_KI270754v1   19456-19457      - |     42761         1
  [42762] chrUn_KI270757v1   20058-20059      + |     42762         1
          dominant_ctss       tpm tpm.dominant_ctss min_density max_density
              <integer> <numeric>         <numeric>   <numeric>   <numeric>
      [1]        633960   1.89798           1.20903 3.31659e-05  0.09842194
      [2]        691180   1.97138           1.97138 3.31659e-05         Inf
      [3]        778798  12.13500           5.69912 1.69388e-04  0.00775037
      [4]        827667   8.23443           3.67962 1.69388e-04  0.01765835
      [5]        904812   1.22548           1.22548 1.15060e-04         Inf
      ...           ...       ...               ...         ...         ...
  [42758]        143289   1.54574           1.54574        -Inf         Inf
  [42759]         18107   1.06082           1.06082        -Inf         Inf
  [42760]          4036   1.70962           1.70962 0.000110863         Inf
  [42761]         19457   2.02858           2.02858 0.000110863         Inf
  [42762]         20059   1.67686           1.67686        -Inf         Inf
  -------
  seqinfo: 640 sequences (1 circular) from hg38 genome_

On the other hand, when I run the default "distclu" option, it works as in the manual.

As there was no score column, the interquartile range could not be plotted (error message:

Error in data.frame(sampleName = sampleLabels(object)[[x]], iq_width = decode(gr$interquantile_width)) :
 arguments imply differing number of rows: 1, 0).

Maybe it is a known / intended behaviour, but it was not clear to me that different downstream steps should be followed when using a different method for tag clustering. So I thought I would add it as an issue here.

charles-plessy commented 1 year ago

Thanks for the report, this is a bug.

By the way, you can use Markdown formatting so that the examples you provided are well formatted on GitHub. Just insert lines with three backticks around the parts to be displayed verbatim.

like that
charles-plessy commented 1 year ago

Actually, it works for me…


> x <- exampleCAGEexp |>
    clusterCTSS(method = "paraclu") |>
    cumulativeCTSSdistribution() |>
    quantilePositions()
> tc <- tagClustersGR(x, returnInterquantileWidth = T, qU=.1, qL = .9)
> score(unlist(tc))
NULL