Closed trangng03 closed 1 year ago
Hello, are you using version 2.6.1
? I have uploaded it this week and it has a lot of fixes for the aggregateTagClusters()
function.
Hello I am using 2.7.1, the newest one!
Does it work without the quantile options, and in any case, can you send me a minimal reproducible example ?
One more question: did you use paraclu to generate the tag clusters? At the moment aggregating paraclu clusters is broken.
this is the error if I excluded the quantile option:
ce <- aggregateTagClusters(ce, tpmThreshold = 5, maxDist = 100)
Error in .aggregateTagClustersGR(object, tpmThreshold = tpmThreshold, :
lazy-load database '/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/CAGEr/R/CAGEr.rdb' is corrupt
In addition: Warning messages:
1: In .aggregateTagClustersGR(object, tpmThreshold = tpmThreshold, :
restarting interrupted promise evaluation
2: In .aggregateTagClustersGR(object, tpmThreshold = tpmThreshold, :
internal error -3 in R_decompress1
Here is what I run for my files ( I just used 2 files first instead of 6 files for checking if everything works):
ce <- CAGEexp( genomeName = "BSgenome.Hsapiens.UCSC.hg38", inputFiles = c("/Users/trangnguyen/Documents/LAB/CAGEseq_BRRF1/input/BRRF1.1.fastq.gz.hg38plusAkata_inverted-Aligned.sortedByCoord.out.bam", "/Users/trangnguyen/Documents/LAB/CAGEseq_BRRF1/input/C1.2.fastq.gz.hg38plusAkata_inverted-Aligned.sortedByCoord.out.bam"), inputFilesType = "bam", sampleLabels = c("BRRF1.1", "C1.2"))
colData((ce))
ce = getCTSS(ce)
CTSStagCountSE(ce)
CTSScoordinatesGR(ce)
CTSStagCountDF(ce)
CTSStagCountGR(ce,1)
sampleLabels(ce)
gtf_file="/Users/trangnguyen/Documents/LAB/CAGEseq_BRRF1/input/hg38_plus_Akata_inverted.gtf"
gtf_file = import.gff(gtf_file)
ce <- annotateCTSS(ce, gtf_file)
colData(ce)[,c("librarySizes", "promoter", "exon", "intron", "unknown")]
CTSScoordinatesGR(ce)
librarySizes(ce)
ce <- normalizeTagCount(ce, method = "powerLaw", fitInRange = c(5, 1000), alpha = 1.2, T = 1*10^6)
ce[["tagCountMatrix"]]
ce <- clusterCTSS( ce
, threshold = 1
, thresholdIsTpm = TRUE
, nrPassThreshold = 1
, method = "distclu"
, maxDist = 20
, removeSingletons = TRUE
, keepSingletonsAbove = 5
)
tagClustersGR(ce, sample = "BRRF1.1")
ce <- cumulativeCTSSdistribution(ce, clusters = "tagClusters", useMulticore = T)
ce <- quantilePositions(ce, clusters = "tagClusters", qLow = 0.1, qUp = 0.9)
tagClustersGR( ce, "BRRF1.1"
, returnInterquantileWidth = TRUE, qLow = 0.1, qUp = 0.9)
plotInterquantileWidth(ce, clusters = "tagClusters", tpmThreshold = 3, qLow = 0.1, qUp = 0.9)
ce <- aggregateTagClusters(ce, tpmThreshold = 5, qLow = 0.1, qUp = 0.9, maxDist = 100)
Everything works until the aggregateTagClusters step. Also, I used "distclu" for this, and I also tried "paraclu" but it failed. Thank you so much!
Regading the 'CAGEr.rdb' is corrupt
error, I sometimes get this with any package after multiple cycles of rebuilding from source or GitHub. I think that if you restart your R session the error will be gone.
Regarding the main issue, can you share with me your BAM files?
Hi Charles, After I restarted my R session, the aggregateTagClusters works. The result shows me this:
> consensusClustersGR(ce)
ConsensusClusters object with 39309 ranges and 3 metadata columns:
seqnames ranges strand | score dominant_ctss
<Rle> <IRanges> <Rle> | <numeric> <GRanges>
chr1:633966-634028:+ chr1 633966-634028 + | 1459 chr1:634009:+
chr1:778773-779037:+ chr1 778773-779037 + | 194 chr1:778856:+
chr1:827668-827834:+ chr1 827668-827834 + | 752 chr1:827668:+
chr1:960570-960634:+ chr1 960570-960634 + | 289 chr1:960585:+
chr1:1010653-1010655:+ chr1 1010653-1010655 + | 42 chr1:1010655:+
... ... ... ... . ... ...
chrY:56839054-56839090:- chrY 56839054-56839090 - | 60 chrY:56839054:-
chrY:56839446-56839461:- chrY 56839446-56839461 - | 32 chrY:56839459:-
chrY:56844018-56844025:- chrY 56844018-56844025 - | 7 chrY:56844024:-
chrY:56846221-56846326:- chrY 56846221-56846326 - | 22 chrY:56846321:-
chrY:56849088:- chrY 56849088 - | 9 chrY:56849088:-
tpm.dominant_ctss
<Rle>
chr1:633966-634028:+ 536
chr1:778773-779037:+ 42
chr1:827668-827834:+ 244
chr1:960570-960634:+ 105
chr1:1010653-1010655:+ 42
... ...
chrY:56839054-56839090:- 19
chrY:56839446-56839461:- 17
chrY:56844018-56844025:- 7
chrY:56846221-56846326:- 15
chrY:56849088:- 9
-------
seqinfo: 711 sequences (1 circular) from hg38 genome
> consensusClustersGR( ce, sample = "BRRF1.1")
ConsensusClusters object with 39309 ranges and 4 metadata columns:
seqnames ranges strand | score dominant_ctss
<Rle> <IRanges> <Rle> | <integer> <GRanges>
chr1:633966-634028:+ chr1 633966-634028 + | 584 chr1:634009:+
chr1:778773-779037:+ chr1 778773-779037 + | 36 chr1:778856:+
chr1:827668-827834:+ chr1 827668-827834 + | 178 chr1:827668:+
chr1:960570-960634:+ chr1 960570-960634 + | 54 chr1:960585:+
chr1:1010653-1010655:+ chr1 1010653-1010655 + | 3 chr1:1010655:+
... ... ... ... . ... ...
chrY:56839054-56839090:- chrY 56839054-56839090 - | 7 chrY:56839054:-
chrY:56839446-56839461:- chrY 56839446-56839461 - | 4 chrY:56839459:-
chrY:56844018-56844025:- chrY 56844018-56844025 - | 1 chrY:56844024:-
chrY:56846221-56846326:- chrY 56846221-56846326 - | 6 chrY:56846321:-
chrY:56849088:- chrY 56849088 - | 1 chrY:56849088:-
tpm.dominant_ctss tpm
<Rle> <integer>
chr1:633966-634028:+ 536 584
chr1:778773-779037:+ 42 36
chr1:827668-827834:+ 244 178
chr1:960570-960634:+ 105 54
chr1:1010653-1010655:+ 42 3
... ... ...
chrY:56839054-56839090:- 19 7
chrY:56839446-56839461:- 17 4
chrY:56844018-56844025:- 7 1
chrY:56846221-56846326:- 15 6
chrY:56849088:- 9 1
-------
seqinfo: 711 sequences (1 circular) from hg38 genome
Does "score" indicate "CAGE signal"? if I want to do differential expression using:
ce$group <- factor(c("a","a", "b","b"))
> dds <- consensusClustersDESeq2(ce, ~group)
> dds <- DESeq(dds)
Does it mean the "score" in each sample is used for the differential expression of DeSeq2? I also do not know why my score is "integer", not "numeric"? Thank you so much for your help!
Indeed, the scores in the consensus clusters should be numeric. The integer values make it look like raw tag counts are being used…
I can not reproduce this bug on my machine with my data. For me to help you further, I would need you to send me a reproducible test case (data plus script).
Hello, I still can not reproduce the bug, so I am closing the issue. Please reopen it if you have more information or if you can provide test data that triggers the bug.
Hello, I am using CAGEr and when I run this command as instructed, I always got this error. Can anyone help me to figure out this error? Thank you so much! ce <- aggregateTagClusters(ce, tpmThreshold = 5, qLow = 0.1, qUp = 0.9, maxDist = 100) Error in sapply(grouped_scores, find.dominant.idx) - 1 : non-numeric argument to binary operator In addition: Warning messages: 1: In max(x) : no non-missing arguments to max; returning -Inf 2: In max(x) : no non-missing arguments to max; returning -Inf