Open Mailinnia opened 3 years ago
I tested on another data set with chromosomes lengths below the limit. KaryoploteR cannot use csi index it seems.
Hi @Maillinia
I was checking that and karyoploteR uses Rsamtools to interact with the BAM files, so it could be a problem if Rsamtools does not support CSI indexes. I'll ask to the developers mailing list to see if this is the case.
Hi @Mailinnia did you find a workaround for your problem? I'm working with a barley genome right now and (I guess) I have the same problem. Thanks in advance.
Hi @Faewks,
I wanted to do the kpPlotBAMDensity
. So with large genomes I would use Bedtools, then import the data into R and make a custom kpPlotBAMDensity.
That worked very well.
to get the dens
generated by samtools, I replaced in kpPlotBAMDensity
:
dens <- Rsamtools::countBam([data](https://rdrr.io/r/utils/data.html), param=Rsamtools::ScanBamParam([which](https://rdrr.io/r/base/which.html) = [windows](https://rdrr.io/r/grDevices/windows.html)))$records
with:
windows_bed <- windows_data %>% select(seqnames,start,end)
write.table(windows_bed, file='windows.bed', sep = "\t", col.names = FALSE, row.names = FALSE, quote=FALSE)
handle_bedtools <- '/usr/local/bedtools/bin/bedtools'
args <- c('intersect', '-a','windows.bed', '-b', 'output.sort.bam', '-C', '-bed')
records <- read.table(text = system2(handle_bedtools, args, stdout=TRUE), sep = "\t", header=FALSE)
dens <- as.numeric(unlist(records$V4))
file.remove('windows.bed')```
Hi Mailinnia, thank you for your quick answer! I'm sorry but I do not understand your answer. What function of Bedtools are you using? Do you change something in the library of the tool karyoploteR for the function kpPlotBAMDensity?
I'm sorry I'm quite new to this.
Hi Faewks, I looked into the function kpPlotBAMDensity in order to make images like this:
You can find the source code for kpPlotBAMDensity here: https://rdrr.io/bioc/karyoploteR/src/R/kpPlotBAMDensity.R
I made a custom version of this function by replacing the Rsamtools command with bedtools intersect (see previous answer).
handle_bedtools <<- PATH_TO_BEDTOOLS
kpPlotBAMDensity_custom <- function(karyoplot, data=NULL, window.size=1e6, normalize=FALSE, ymin=NULL, ymax=NULL, data.panel=1, r0=NULL, r1=NULL, col="gray80", border=NA, clipping=TRUE,...) {
if(!methods::is(karyoplot, "KaryoPlot")) stop(paste0("In kpPlotBAMDensity: 'karyoplot' must be a valid 'KaryoPlot' object"))
if(is.character("data")) {
data <- Rsamtools::BamFile(file = data)
}
if(!methods::is(data, "BamFile")) stop(paste0("In kpPlotBAMDensity: 'data' must be a character or a 'BamFile' object."))
karyoplot$beginKpPlot()
on.exit(karyoplot$endKpPlot())
#use tileGenome to create windows only on the visible part of the genome, that is in the karyoplot$plot.region
plot.region.lengths <- setNames(width(karyoplot$plot.region), as.character(seqnames(karyoplot$plot.region)))
windows <- tileGenome(plot.region.lengths, tilewidth = window.size, cut.last.tile.in.chrom = TRUE)
seqinfo(windows) <- GenomeInfoDb::Seqinfo(seqnames=seqlevels(windows)) #remove the seqlength info from seqinfo to avoid a potential out-of-bounds warning when shifting the windows
#Now, move the windows start(karyoplot$plor.region) bases to the right.It's only necessary when zoomed or with chromosomes not starting at position 1
windows <- shift(windows, shift=start(karyoplot$plot.region[seqnames(windows)])-1)
windows_data <- data.frame(windows)
windows_bed <- windows_data %>% select(seqnames,start,end)
write.table(windows_bed, file='windows.bed', sep = "\t", col.names = FALSE, row.names = FALSE, quote=FALSE)
args <- c('intersect', '-a','windows.bed', '-b', 'output.sort.bam', '-C', '-bed')
records <- read.table(text = system2(handle_bedtools, args, stdout=TRUE), sep = "\t", header=FALSE)
dens <- as.numeric(unlist(records$V4))
file.remove('windows.bed')
if(normalize==TRUE) {
total.reads <- sum(Rsamtools::idxstatsBam(file=data)$mapped)
dens <- dens/total.reads
}
if(is.null(ymax)) {
ymax <- max(dens)
}
#Specify the missing colors if possible
if(is.null(border) & !is.null(col) & !is.na(col)) {
border=darker(col, amount = 100)
}
if(is.null(col) & !is.null(border) & !is.na(col)) {
col=lighter(border)
}
if(is.na(col) & is.null(border)) {
border <- "black"
}
karyoplot <- kpBars(karyoplot, data=windows, y0=0, y1=dens,ymin=ymin, ymax=ymax, data.panel=data.panel, r0=r0, r1=r1, border=border, col=col, clipping=clipping, ...)
karyoplot$latest.plot <- list(funct="kpPlotBAMDensity", computed.values=list(density=dens, windows=windows, max.density=max(dens)))
invisible(karyoplot)
}
You need a genome file containing the name, start and end positions of the chromosomes present in your genome e.g.: chr1H 1 516505933 chr2H 1 665585732 chr3H 1 621516507 chr4H 1 610333536 chr5H 1 588218687 chr6H 1 561794516 chr7H 1 632540562 chrUn 1 29110254
Then I run this code to create the image:
k_genome <- read.table(GENOME_FILE, header=FALSE)
max_chromosome <- max(k_genome$V3)
bai_index_size <- 536870912
custom_genome <- toGRanges(GENOME_FILE)
pp <- getDefaultPlotParams(plot.type=1)
pp$ideogramheight <- 0.1
kp <- plotKaryotype(genome = custom_genome, cex=0.4, plot.params = pp)
#Cannot use kpPlotBAMDensity with CSI index, must use custom plot using bedtools for density calculations
if (max_chromosome > bai_index_size){
bam.read <- kpPlotBAMDensity_custom(kp, data=BAM_FILE, col=COLOR, r0=0, r1=0.5, window.size=WINDOW_SIZE)
} else {
bam.read <- kpPlotBAMDensity(kp, data=BAM_FILE, col=COLOR, r0=0, r1=0.5, window.size=WINDOW_SIZE)
}
ymax.read <- bam.read$latest.plot$computed.values$max.density
kpAxis(kp, ymax=ymax.read, numticks=2, cex=0.25, r0=0, r1=0.5, col=COLOR, side=2)
kpAddBaseNumbers(kp, tick.dist=TICK_DISTANCE, add.units=TRUE, minor.ticks=TRUE, minor.tick.dist = MINOR_TICK_DISTANCE, cex=0.25, tick.len=10, minor.tick.len=4)
kpAddMainTitle(kp, main=TITLE, cex=0.5)
I don't know you're trying to do with karyoplotR. But figure out which part uses Rsamtools, and try to find other tools that can generate the data you need to make a custom karyoplotR function.
Thank you very much! This is exactly what I needed:)
I never thought about replacing Rsamtools.
Hi,
I'm working with the wheat genome which has very large chromosomes, exceeding the apparent limit of 536870912. I had to use a csi index to index the bam file (samtools index -c).
Is there a possible solution to this? I would very much like to be able to plot my data.