charles-plessy / CAGEr

Mirror of Bioconductor's CAGEr package repository
https://bioconductor.org/packages/CAGEr
6 stars 4 forks source link

aggregateTagClusters function error in CAGEr #93

Closed trangng03 closed 1 year ago

trangng03 commented 1 year ago

Hello, I am using CAGEr and when I run this command as instructed, I always got this error. Can anyone help me to figure out this error? Thank you so much! ce <- aggregateTagClusters(ce, tpmThreshold = 5, qLow = 0.1, qUp = 0.9, maxDist = 100) Error in sapply(grouped_scores, find.dominant.idx) - 1 : non-numeric argument to binary operator In addition: Warning messages: 1: In max(x) : no non-missing arguments to max; returning -Inf 2: In max(x) : no non-missing arguments to max; returning -Inf

charles-plessy commented 1 year ago

Hello, are you using version 2.6.1 ? I have uploaded it this week and it has a lot of fixes for the aggregateTagClusters() function.

trangng03 commented 1 year ago

Hello I am using 2.7.1, the newest one!

charles-plessy commented 1 year ago

Does it work without the quantile options, and in any case, can you send me a minimal reproducible example ?

charles-plessy commented 1 year ago

One more question: did you use paraclu to generate the tag clusters? At the moment aggregating paraclu clusters is broken.

trangng03 commented 1 year ago

this is the error if I excluded the quantile option:

ce <- aggregateTagClusters(ce, tpmThreshold = 5, maxDist = 100)
Error in .aggregateTagClustersGR(object, tpmThreshold = tpmThreshold,  : 
  lazy-load database '/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/CAGEr/R/CAGEr.rdb' is corrupt
In addition: Warning messages:
1: In .aggregateTagClustersGR(object, tpmThreshold = tpmThreshold,  :
  restarting interrupted promise evaluation
2: In .aggregateTagClustersGR(object, tpmThreshold = tpmThreshold,  :
  internal error -3 in R_decompress1
trangng03 commented 1 year ago

Here is what I run for my files ( I just used 2 files first instead of 6 files for checking if everything works):

ce <- CAGEexp( genomeName     = "BSgenome.Hsapiens.UCSC.hg38", inputFiles     = c("/Users/trangnguyen/Documents/LAB/CAGEseq_BRRF1/input/BRRF1.1.fastq.gz.hg38plusAkata_inverted-Aligned.sortedByCoord.out.bam", "/Users/trangnguyen/Documents/LAB/CAGEseq_BRRF1/input/C1.2.fastq.gz.hg38plusAkata_inverted-Aligned.sortedByCoord.out.bam"), inputFilesType = "bam", sampleLabels   = c("BRRF1.1", "C1.2"))
colData((ce))
ce = getCTSS(ce)
CTSStagCountSE(ce)
CTSScoordinatesGR(ce)
CTSStagCountDF(ce)
CTSStagCountGR(ce,1)
sampleLabels(ce)
gtf_file="/Users/trangnguyen/Documents/LAB/CAGEseq_BRRF1/input/hg38_plus_Akata_inverted.gtf"
gtf_file = import.gff(gtf_file)
ce <- annotateCTSS(ce, gtf_file)
colData(ce)[,c("librarySizes", "promoter", "exon", "intron", "unknown")]
CTSScoordinatesGR(ce)
librarySizes(ce)
ce <- normalizeTagCount(ce, method = "powerLaw", fitInRange = c(5, 1000), alpha = 1.2, T = 1*10^6)
ce[["tagCountMatrix"]]
ce <- clusterCTSS( ce
                   , threshold = 1
                   , thresholdIsTpm = TRUE
                   , nrPassThreshold = 1
                   , method = "distclu"
                   , maxDist = 20
                   , removeSingletons = TRUE
                   , keepSingletonsAbove = 5
                   )

tagClustersGR(ce, sample = "BRRF1.1")
ce <- cumulativeCTSSdistribution(ce, clusters = "tagClusters", useMulticore = T)
ce <- quantilePositions(ce, clusters = "tagClusters", qLow = 0.1, qUp = 0.9)
tagClustersGR( ce, "BRRF1.1"
               , returnInterquantileWidth = TRUE,  qLow = 0.1, qUp = 0.9)
plotInterquantileWidth(ce, clusters = "tagClusters", tpmThreshold = 3, qLow = 0.1, qUp = 0.9)
ce <- aggregateTagClusters(ce, tpmThreshold = 5, qLow = 0.1, qUp = 0.9, maxDist = 100)

Everything works until the aggregateTagClusters step. Also, I used "distclu" for this, and I also tried "paraclu" but it failed. Thank you so much!

charles-plessy commented 1 year ago

Regading the 'CAGEr.rdb' is corrupt error, I sometimes get this with any package after multiple cycles of rebuilding from source or GitHub. I think that if you restart your R session the error will be gone.

Regarding the main issue, can you share with me your BAM files?

trangng03 commented 1 year ago

Hi Charles, After I restarted my R session, the aggregateTagClusters works. The result shows me this:

> consensusClustersGR(ce)
ConsensusClusters object with 39309 ranges and 3 metadata columns:
                           seqnames            ranges strand |     score   dominant_ctss
                              <Rle>         <IRanges>  <Rle> | <numeric>       <GRanges>
      chr1:633966-634028:+     chr1     633966-634028      + |      1459   chr1:634009:+
      chr1:778773-779037:+     chr1     778773-779037      + |       194   chr1:778856:+
      chr1:827668-827834:+     chr1     827668-827834      + |       752   chr1:827668:+
      chr1:960570-960634:+     chr1     960570-960634      + |       289   chr1:960585:+
    chr1:1010653-1010655:+     chr1   1010653-1010655      + |        42  chr1:1010655:+
                       ...      ...               ...    ... .       ...             ...
  chrY:56839054-56839090:-     chrY 56839054-56839090      - |        60 chrY:56839054:-
  chrY:56839446-56839461:-     chrY 56839446-56839461      - |        32 chrY:56839459:-
  chrY:56844018-56844025:-     chrY 56844018-56844025      - |         7 chrY:56844024:-
  chrY:56846221-56846326:-     chrY 56846221-56846326      - |        22 chrY:56846321:-
           chrY:56849088:-     chrY          56849088      - |         9 chrY:56849088:-
                           tpm.dominant_ctss
                                       <Rle>
      chr1:633966-634028:+               536
      chr1:778773-779037:+                42
      chr1:827668-827834:+               244
      chr1:960570-960634:+               105
    chr1:1010653-1010655:+                42
                       ...               ...
  chrY:56839054-56839090:-                19
  chrY:56839446-56839461:-                17
  chrY:56844018-56844025:-                 7
  chrY:56846221-56846326:-                15
           chrY:56849088:-                 9
  -------
  seqinfo: 711 sequences (1 circular) from hg38 genome

> consensusClustersGR( ce, sample = "BRRF1.1")
ConsensusClusters object with 39309 ranges and 4 metadata columns:
                           seqnames            ranges strand |     score   dominant_ctss
                              <Rle>         <IRanges>  <Rle> | <integer>       <GRanges>
      chr1:633966-634028:+     chr1     633966-634028      + |       584   chr1:634009:+
      chr1:778773-779037:+     chr1     778773-779037      + |        36   chr1:778856:+
      chr1:827668-827834:+     chr1     827668-827834      + |       178   chr1:827668:+
      chr1:960570-960634:+     chr1     960570-960634      + |        54   chr1:960585:+
    chr1:1010653-1010655:+     chr1   1010653-1010655      + |         3  chr1:1010655:+
                       ...      ...               ...    ... .       ...             ...
  chrY:56839054-56839090:-     chrY 56839054-56839090      - |         7 chrY:56839054:-
  chrY:56839446-56839461:-     chrY 56839446-56839461      - |         4 chrY:56839459:-
  chrY:56844018-56844025:-     chrY 56844018-56844025      - |         1 chrY:56844024:-
  chrY:56846221-56846326:-     chrY 56846221-56846326      - |         6 chrY:56846321:-
           chrY:56849088:-     chrY          56849088      - |         1 chrY:56849088:-
                           tpm.dominant_ctss       tpm
                                       <Rle> <integer>
      chr1:633966-634028:+               536       584
      chr1:778773-779037:+                42        36
      chr1:827668-827834:+               244       178
      chr1:960570-960634:+               105        54
    chr1:1010653-1010655:+                42         3
                       ...               ...       ...
  chrY:56839054-56839090:-                19         7
  chrY:56839446-56839461:-                17         4
  chrY:56844018-56844025:-                 7         1
  chrY:56846221-56846326:-                15         6
           chrY:56849088:-                 9         1
  -------
  seqinfo: 711 sequences (1 circular) from hg38 genome

Does "score" indicate "CAGE signal"? if I want to do differential expression using:

ce$group <- factor(c("a","a", "b","b"))
> dds <- consensusClustersDESeq2(ce, ~group)
> dds <- DESeq(dds)

Does it mean the "score" in each sample is used for the differential expression of DeSeq2? I also do not know why my score is "integer", not "numeric"? Thank you so much for your help!

charles-plessy commented 1 year ago

Indeed, the scores in the consensus clusters should be numeric. The integer values make it look like raw tag counts are being used…

I can not reproduce this bug on my machine with my data. For me to help you further, I would need you to send me a reproducible test case (data plus script).

charles-plessy commented 1 year ago

Hello, I still can not reproduce the bug, so I am closing the issue. Please reopen it if you have more information or if you can provide test data that triggers the bug.