Open bkaczkowski opened 5 years ago
Hello Bogumil, glad to see you are trying out CAGEfightR!
It looks like CAGEfightR caught an error trying to process the pooled signal. It's trying to avoid dividing by a negative number when calculating the balance for score for enhancer prediction.
It's hard for me to know what's going wrong without the actual files. I suspect it could be something with the input CTSSs leading to a pooled signal with negative values.
You could try running calcTPM and calcPooled separately and then check how the score column looks - hopefully all CTSSs should have a non-negative pooled TPM!
Yes looks like I have a problem with my bigwig files [we normally don't use this format around here much :) ]. I actually used CAGEr's function to do it [ exportCTSStoBedGraph(ce, values = "raw", format = "BigWig") ] but that didn't go well. Could you point me to a tested method to generate bigwigs that work well with your package (it would be nice to add to the tutorial too). Did you use bamCoverage from deeptools?
Yes, many people use BED-files for CTSSs, however BigWigs have the advantage that they store chromosome lengths + allow for random access and thereby parallel processing.
I'm working on a major update to CAGEfightR that will allow quantifyCTSSs to read bedGraph-files + checkCTSSs function that will check the input files for any inconsistencies.
If you have BED-files with CTSSs (e.g. from CAGEr), you can read them into a GRanges using rtracklayer::import, split them by strand, and then write them to BigWig-files using rtracklayer::export.
We had BAM-to-BigWig converter based on RSamTools/GenomicAlignments in the works, but it's hard to make a consistent function that handles the many variations of CAGE and other types of 5'-end data. We've had a couple of request for it already, we might add in a future update.
Ok, thanks I will try the option with rtracklayer::import / rtracklayer::export and will let you know how it went
Hello again, rtracklayer::import / rtracklayer::export did the trick.
Hi Malte and bkaczkowski,
I too am having trouble generating bigwig files that work with quickEnhancers(). I have started with bam files (processed by a collaborator so I cannot get the raw fastq files) and have tried multiple ways of generating bigwig files to no avail. The method that has got me furthest is uploading the bam files into CAGEr and then exporting them as bedGraphs and using bedGraphToBigWig to generate the bigwig files. These files work with no errors for quickTSSs(), but I get the following error when running quickEnhancers():
Error in (function (PD, MD, PU, MU) : BC calculation stopped because some sites had negative pooled signal!
Some googling led me to this discussion and prompted me to run checkPooled() which gave me:
Error in checkPooled(CTSSs) : The supplied pooled signal is not valid!
I am new to CAGE data and relatively inexperienced with R. Can you suggest a work around for me? I looked into the suggestions of rtracklayer::import / rtracklayer::export, but can't seem to get that working either (but that is probably my poor R skills).
Any help would be greatly appreciated.
Thank you, Christy.
Hello Christy!
Exciting to see you are considering CAGEfightR for your analysis!
Problems with getting CTSSs correctly are unfortunately common. Last Bioconductor version I added some conversion functions: Did you try to convert BED-files (e.g. from CAGEr or whatever pipeline you are using for processing) with convertBED2BigWig?
I did see documentation for this, but couldn't get it going on our cluster (function not recognised). I thought maybe it had been removed because it was experimental. Maybe the Bioconductor version we have is old. Will ask my cluster people and give it a go. Will let you know how it goes. Thanks for your quick reply!!
Unfortunately, still not working :( . Here is the code I used in case you can see anything I have obviously done wrong. BedGraph files were createdfrom bam files imported into CAGEr and exported out as BedGraph (for each sample, one bedgraph for plus strand and one for minus strand).
Design<-read.table(file="designMatrix_bedGraph.txt",header=TRUE)
bg_plus <- system.file("extdata",Design$BedGraphPlus, package = "CAGEfightR") bg_minus <- system.file("extdata",Design$BedGraphMinus, package = "CAGEfightR") bw_plus <- gsub("bedGraph","bw",bg_plus) bw_minus <- gsub("bedGraph","bw",bg_minus)
genomeInfo <- SeqinfoForUCSCGenome("bosTau9")
convertBedGraph2BigWig(input=bg_plus, output=bw_plus,genome=genomeInfo) _Attempting to convert file: File1_plus.bedGraph Attempting to convert file: File2_plus.bedGraph Attempting to convert file: File3_plus.bedGraph … Attempting to convert file: File25_plus.bedGraph Warning message: In convertBedGraph2BigWig(input = bg_plus, output = bwplus, genome = genomeInfo) : Input bedGraph files do not all have the proper .bedGraph extension! Attempting import anyway...
convertBedGraph2BigWig(input=bg_minus, output=bw_minus,genome=genomeInfo) _Attempting to convert file: File1_minus.bedGraph Attempting to convert file: File2_minus.bedGraph Attempting to convert file: File3_minus.bedGraph … Attempting to convert file: File25_minus.bedGraph Warning message: In convertBedGraph2BigWig(input = bg_minus, output = bwminus, genome = genomeInfo) : Input bedGraph files do not all have the proper .bedGraph extension! Attempting import anyway...
DesignBW<-read.table(file="designMatrix_bw.txt",header=TRUE) row.names(DesignBW)<-DesignBW$Name bw_plus <- system.file("extdata",DesignBW$BigWigPlus, package = "CAGEfightR") bw_minus <- system.file("extdata",DesignBW$BigWigMinus,package = "CAGEfightR") bw_plus <- BigWigFileList(bw_plus) bw_minus <- BigWigFileList(bw_minus) names(bw_plus) <- DesignBW$Name names(bw_minus) <- DesignBW$Name
CTSSs <- quantifyCTSSs(plusStrand=bw_plus,minusStrand=bw_minus,design=DesignBW,genome=genomeInfo) Checking design... Checking supplied genome compatibility... Iterating over 1 genomic tiles in 25 samples using 46 worker(s)... Importing CTSSs from plus strand... Importing CTSSs from minus strand... Merging strands... Formatting output... CTSS summary Number of samples: 25 Number of CTSSs: 92.95 millions Sparsity: 94.5 % Type of rowRanges: StitchedGPos Final object size: 2.19 GB
CTSSs <- CTSSs %>% calcTPM() %>% calcPooled() Calculating library sizes... Calculating TPM...
TCs <- quickTSSs(CTSSs) Using existing score column! Running clusterUnidirectionally: Splitting by strand... Slice-reduce to find clusters... Calculating statistics... Preparing output... Tag clustering summary: Width Count Percent Total 26300981 1e+02 % =1 21412423 8e+01 % =10 4830871 2e+01 % =100 57662 2e-01 % =1000 25 1e-04 % Running quantifyClusters: Finding overlaps... Aggregating within clusters...
BCs <- quickEnhancers(CTSSs) Using existing score column! Running clusterBidirectionally: Pre-filtering bidirectional candidate regions... Retaining for analysis: 98% Splitting by strand... Calculating windowed coverage on plus strand... Calculating windowed coverage on minus strand... Calculating balance score... Error in (function (PD, MD, PU, MU) : BC calculation stopped because some sites had negative pooled signal!
And that’s as far as I get. Still the same message about negative pooled signal. Is it safe to try the convertBAM2BigWig function?
This does indeed look a bit strange!
You could try using convertBAM2BigWig, however I haven't tested it very much yet + it's not super fast. So use at your own risk...
A few things:
For specifying file paths, you don't need the bw_plus <- system.file("extdata", ... , package = "CAGEfightR"), you can simply type out the path on your hard drive, e.g. "/path/to/my/bigwigs/sample1_plus.bw"
I don't remember exactly how CAGEr does it, but the file conversions seem to produce some warnings: Are you sure the files are properly formatted? You can try and run the checkCTSSs on a few of the files to see if they contain what you expect. If you just need to load the data into R without running the full CAGEfightR pipeline, you can simple use rtracklayer::import.
Hope this helps!
Hello Malte I'me having a similar error for the people above when running:
BCs <- quickEnhancers(CTSSs)
Error in (function (PD, MD, PU, MU) :
BC calculation stopped because some sites had negative pooled signal!
my data were exported from cageR with these options:
exportCTSStoBedGraph(ce.rep, values = "raw", format = "BigWig")
I also tried to export them with, but still the same issue
exportCTSStoBedGraph(ce.rep, values = "normalized", format = "BigWig")
And I also tried to export the in bed format and covert them with rtracklayer as indicated above, but i stll have the same error. Did i do something wrong?
exportToBed(object = ce.rep, what = "tagClusters", qLow = 0.1, qUp = 0.9, oneFile = FALSE)
library("BSgenome.Hsapiens.UCSC.hg19")
bsg <- BSgenome.Hsapiens.UCSC.hg19
gr = rtracklayer::import( "myfile.tagClusters.qLow0.1_qUp0.9.bed" )
#add seqlengths because it was missing
seq_names = names( seqlengths(gr) )
seq_length = seqlengths(bsg) [ seq_names ]
seqlengths(gr) = seq_length
split(gr, strand(gr))
gr.plus <- gr[strand( gr) == "+"]
gr.minus <- gr[strand( gr) == "-"]
rtracklayer::export(gr.plus , myfile.CTSS.RAW.plus.bw , format = "BigWig")
rtracklayer::export(gr.minus, myfile.CTSS.RAW.minus.bw, format = "BigWig")
The plot thickens! Would be nice to get to the bottom of this: Ideally CAGEr and CAGEfightR should be highly compatible.
There's a few things you could try to nail down the issue:
Hi Malte, thanks for your suggestions. Do you mean exporting CAGEr data in BEDGRAPH file? there are 2 fuctions for exporting data
Initially I exported CTSS data on bigwig format, and later in bedgraph(see below)
BigWig format after importing them with quantifyCTSSs(), there are negative counts displayed by tail() function, is that expected?
exportCTSStoBedGraph(ce.rep, values = "raw" , format = "BigWig")
CTSSs = quantifyCTSSs(plusStrand = bw_plus, minusStrand = bw_minus, genome = seqinfo(bsg), design = df)
assay(CTSSs, "counts") %>% head
6 x 57 sparse Matrix of class "dgCMatrix"
[[ suppressing 57 column names 'CN1_MB', 'CN1_MT01', 'CN1_MT02' ... ]]
[1,] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 . . . . . . . . . . . . . . . . . . . . .
[2,] . . . . . . . . . . . 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
[3,] . . . . . . . . . . . . . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
[4,] . . . . . . . . 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 .
[5,] . . . . . . . . . . . . . . . . 2 . . . . . . . . . . . . 2 . . . . . . . . . . . . . . . . . . . . . . . . . 9 .
[6,] . . . . . . . . 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 . . . . 3 . 1 . . . . .
> assay(CTSSs, "counts") %>% tail
6 x 57 sparse Matrix of class "dgCMatrix"
[[ suppressing 57 column names 'CN1_MB', 'CN1_MT01', 'CN1_MT02' ... ]]
[19030497,] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . -1 . . . . . . . . . . . . . . . . . . . .
[19030498,] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . -1 . . . . . . . . . .
[19030499,] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . -1
[19030500,] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . -1 . . . . . . . . . .
[19030501,] . . . . -7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
[19030502,] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . -1 . . . . . . . . . . . . . . . . . . . .
>
BedGraph format 1) So I tried to export CAGEr data in bedgraph files :**
exportCTSStoBedGraph(ce.rep, values = "raw", format = "bedGraph", oneFile = FALSE)
To convert bedgraph files to bigwig i tried 2 ways:
2a ) I converted each bedgraph file with convertBedGraph2BigWig() in bigwig and I have this warning message
for (i in bedGraph_files)
{ j=gsub(".bedGraph",".bw",i) ;
j=gsub("raw","Raw_converted",j) ;
convertBedGraph2BigWig(i, j, genome=SeqinfoForUCSCGenome(genome))
}
Attempting to convert file: my_file.raw.minus.bedGraph
Warning message:
In convertBedGraph2BigWig( Bedgraph_file , BigWig_file , genome = SeqinfoForUCSCGenome("hg38")) :
Input bedGraph files do not all have the proper .bedGraph extension! Attempting import anyway
2b ) I converted each bedgraph file in bigwig by importing/exporting the object into R with rtracklayer and I checked that the scores were all positive
for (i in bedGraph_files)
{
j=gsub(".bedGraph",".bw",i) ;
j=gsub("raw","Raw_conv2",j) ;
gr = rtracklayer::import( i, genome=SeqinfoForUCSCGenome("hg38") )
sum(gr$score <0)
[1] 0
rtracklayer::export( gr , j , format = "BigWig")
}
The files produced with these 2 steps are identical to the original bigwig file exported from CAGEr
ll sampleA.CTSS*aw*plus*.bw
-rw-r--r-- 1 celiim 3748056 Sep 7 14:48 sampleA.CTSS.Raw_conv2.plus.bw
-rw-r--r-- 1 celiim 3748056 Sep 7 13:05 sampleA.CTSS.Raw_converted.plus.bw
-rw-r--r-- 1 celiim 3748056 Sep 7 11:29 sampleA.CTSS.raw.plus.bw
3 ) I run checkCTSS() on all the 3 bigwig dataset and I get a warning again, but since I have 2 separate files for each strand, I think i can ignore the warning, right?
checkCTSSs(as.character(my_file.minus.bw)
Attempting import of supplied filename...
[1] TRUE
Warning message:
In checkCTSSs(object) :
CTSSs are unstranded: CTSSs must be either on the + or - strand. This warning can be ignored if a BigWig/bedGraph file was supplied, in which case each strand is stored in a separate file
4 ) I ran CAGEfightR commands untill I got the same error
CTSSs = quantifyCTSSs(plusStrand = bw_plus ,minusStrand = bw_minus, genome = seqinfo(bsg), design = df)
CTSSs = CTSSs %>% calcTPM()
CTSSs = CTSSs %>% calcPooled()
TCs <- quickTSSs(CTSSs)
TSSs = TCs %>% calcTPM() %>% subsetBySupport(inputAssay="TPM", unexpressed=1, minSamples=3 )
BCs <- quickEnhancers(CTSSs)
Using existing score column!
Running clusterBidirectionally:
Pre-filtering bidirectional candidate regions...
Retaining for analysis: 75%
Splitting by strand...
Calculating windowed coverage on plus strand...
Calculating windowed coverage on minus strand...
Calculating balance score...
Error in (function (PD, MD, PU, MU) :
BC calculation stopped because some sites had negative pooled signal!
Aha - for libraries where you do see negative counts, did you try looking directly at the bedGraph files? Try and use rtracklayer::import() on the specific library, then use score() to extract the counts and see if any are negative.
Dear Malte, I had checked already the score of a random CTSS.raw.plus.bedGraph file and it looked ok. After your comment I checked all of them and I found out that the CTSS.raw.minus.bedGraph files have negative all score values. So it must be a feature of the minus stranded values? Do I need to remove the minus of the gr object before exporting to a BigWig files?
Having negative values in the bedGraph will definitely disrupt CAGEfightR. I will see if I can add a new internal check to make sure the user is properly warned of this.
I assume CAGEr is flipping the values to make pretty genome browser files, with plus-strand tags counting upwards and minus-strand tags counting downwards (kinda like the Zenbu genome browser).
A quick workaround that might work: Build the RangedSummarizedExperiment of CTSSs, and then make a new assay of only absolute counts: e.g. assay(CTSSs, "counts") <- abs(assay(CTSSs, "counts")).
You could also modify the bedGraph files themselves, e.g. by using rtracklayer::import and then setting score(x) <- abs(score(x))
(Or you could ask Charles Plessy to add the option of not flipping scores in the bedGraph files in CAGEr 😉)
Hi Malte,
I've been AWOL while attending to other work priorities. Just got back to this this morning and tried your latest solution, assay(CTSSs, "counts") <- abs(assay(CTSSs, "counts")) after quantifyCTSSs()
It worked!!!
Just wanted to say thank you for your time in helping solve my problem (and thank you to mirkocelii too for keeping the ball rolling in my absence!).
Fingers crossed it's smooth sailing from here.
Cheers, Christy
BCs <- quickEnhancers(CTSSs) Using existing score column!
- Running clusterBidirectionally: Pre-filtering bidirectional candidate regions... Retaining for analysis: 98% Splitting by strand... Calculating windowed coverage on plus strand... Calculating windowed coverage on minus strand... Calculating balance score... Slice-reduce to find bidirectional clusters... Calculating statistics... Preparing output... # Bidirectional clustering summary: Number of bidirectional clusters: 903907 Maximum balance score: 1 Minimum balance score: 0.950000058268218 Maximum width: 2370 Minimum width: 401
- Running subsetByBidirectionality: Calculating bidirectionality... Subsetting... Removed 489760 out of 903907 regions (54.2%)
- Running quantifyClusters: Finding overlaps... Aggregating within clusters...
Glad it worked! I would do a few sanity check that the abs(counts) workaround is working as intended - CAGEfightR isn't technically built to handle negative counts, but I reckon it shouldn't make a difference. I would probably advice on fixing the bedGraph files themselves in the long term.
I will leave this issue open until I have added a check for negative counts in checkCTSSs!
Dear Malte, thanks for your help. By removing the "-" to minus strand files, and it works !
I've already stressed enough Charles Plessy for CAGEr too, maybe you can add this instruction it in the vignette?
I will look into it, perhaps I could add some extra code to convertBedGraph2BigWig to handle this case
Hi Malte, I'm getting an error (see below) when using quickEnhancers function. The quickTSSs function worked without an issue. Best Bogumil