bernatgel / karyoploteR

karyoploteR - An R/Bioconductor package to plot arbitrary data along the genome
https://bernatgel.github.io/karyoploter_tutorial/
298 stars 42 forks source link

Handling BAM files with large chromosomes #94

Open Mailinnia opened 3 years ago

Mailinnia commented 3 years ago

Hi,

I'm working with the wheat genome which has very large chromosomes, exceeding the apparent limit of 536870912. I had to use a csi index to index the bam file (samtools index -c).

genome_file <- 'genome_file.txt' custom.genome <- toGRanges(genome_file) my_bam <- 'output.sort.bam' kp <- plotKaryotype(genome = custom.genome, cex=0.4) bam.read <- kpPlotBAMDensity(kp, data=my_bam, col="#006c9a", r0=0, r1=0.5, window.size=1e6) Error in value[3L] : 'end' must be <= 536870912 file: output.sort.bam index: NA

Is there a possible solution to this? I would very much like to be able to plot my data.

Mailinnia commented 3 years ago

I tested on another data set with chromosomes lengths below the limit. KaryoploteR cannot use csi index it seems.

bernatgel commented 3 years ago

Hi @Maillinia

I was checking that and karyoploteR uses Rsamtools to interact with the BAM files, so it could be a problem if Rsamtools does not support CSI indexes. I'll ask to the developers mailing list to see if this is the case.

Faewks commented 2 years ago

Hi @Mailinnia did you find a workaround for your problem? I'm working with a barley genome right now and (I guess) I have the same problem. Thanks in advance.

Mailinnia commented 2 years ago

Hi @Faewks, I wanted to do the kpPlotBAMDensity. So with large genomes I would use Bedtools, then import the data into R and make a custom kpPlotBAMDensity. That worked very well.

to get the dens generated by samtools, I replaced in kpPlotBAMDensity: dens <- Rsamtools::countBam([data](https://rdrr.io/r/utils/data.html), param=Rsamtools::ScanBamParam([which](https://rdrr.io/r/base/which.html) = [windows](https://rdrr.io/r/grDevices/windows.html)))$records with:


  windows_bed <- windows_data %>% select(seqnames,start,end)
  write.table(windows_bed, file='windows.bed', sep = "\t", col.names = FALSE, row.names = FALSE, quote=FALSE)

  handle_bedtools <- '/usr/local/bedtools/bin/bedtools'
  args <- c('intersect', '-a','windows.bed', '-b', 'output.sort.bam', '-C', '-bed')
  records <- read.table(text = system2(handle_bedtools, args, stdout=TRUE), sep = "\t", header=FALSE)
  dens <- as.numeric(unlist(records$V4))
  file.remove('windows.bed')```  
Faewks commented 2 years ago

Hi Mailinnia, thank you for your quick answer! I'm sorry but I do not understand your answer. What function of Bedtools are you using? Do you change something in the library of the tool karyoploteR for the function kpPlotBAMDensity?

I'm sorry I'm quite new to this.

Mailinnia commented 2 years ago

Hi Faewks, I looked into the function kpPlotBAMDensity in order to make images like this: image

You can find the source code for kpPlotBAMDensity here: https://rdrr.io/bioc/karyoploteR/src/R/kpPlotBAMDensity.R

I made a custom version of this function by replacing the Rsamtools command with bedtools intersect (see previous answer).

handle_bedtools <<- PATH_TO_BEDTOOLS

  kpPlotBAMDensity_custom <- function(karyoplot, data=NULL, window.size=1e6, normalize=FALSE, ymin=NULL, ymax=NULL, data.panel=1, r0=NULL, r1=NULL, col="gray80", border=NA, clipping=TRUE,...) {

  if(!methods::is(karyoplot, "KaryoPlot")) stop(paste0("In kpPlotBAMDensity: 'karyoplot' must be a valid 'KaryoPlot' object"))
  if(is.character("data")) {
  data <- Rsamtools::BamFile(file = data)
  }
  if(!methods::is(data, "BamFile")) stop(paste0("In kpPlotBAMDensity: 'data' must be a character or a 'BamFile' object."))
  karyoplot$beginKpPlot()
  on.exit(karyoplot$endKpPlot())
  #use tileGenome to create windows only on the visible part of the genome, that is in the karyoplot$plot.region
  plot.region.lengths <- setNames(width(karyoplot$plot.region), as.character(seqnames(karyoplot$plot.region)))
  windows <- tileGenome(plot.region.lengths, tilewidth = window.size, cut.last.tile.in.chrom = TRUE)
  seqinfo(windows) <- GenomeInfoDb::Seqinfo(seqnames=seqlevels(windows)) #remove the seqlength info from seqinfo to avoid a potential out-of-bounds warning when shifting the windows

  #Now, move the windows start(karyoplot$plor.region) bases to the right.It's only necessary when zoomed or with chromosomes not starting at position 1
  windows <- shift(windows, shift=start(karyoplot$plot.region[seqnames(windows)])-1)
  windows_data <- data.frame(windows)
  windows_bed <- windows_data %>% select(seqnames,start,end)
  write.table(windows_bed, file='windows.bed', sep = "\t", col.names = FALSE, row.names = FALSE, quote=FALSE)

  args <- c('intersect', '-a','windows.bed', '-b', 'output.sort.bam', '-C', '-bed')
  records <- read.table(text = system2(handle_bedtools, args, stdout=TRUE), sep = "\t", header=FALSE)
  dens <- as.numeric(unlist(records$V4))
  file.remove('windows.bed')

  if(normalize==TRUE) {
    total.reads <- sum(Rsamtools::idxstatsBam(file=data)$mapped)
    dens <- dens/total.reads
  }

  if(is.null(ymax)) {
    ymax <- max(dens)
  }

  #Specify the missing colors if possible
  if(is.null(border) & !is.null(col) & !is.na(col)) {
    border=darker(col, amount = 100)
  }
  if(is.null(col) & !is.null(border) & !is.na(col)) {
    col=lighter(border)
  }
  if(is.na(col) & is.null(border)) {
    border <- "black"
  }

  karyoplot <- kpBars(karyoplot, data=windows, y0=0, y1=dens,ymin=ymin, ymax=ymax, data.panel=data.panel, r0=r0, r1=r1, border=border, col=col, clipping=clipping, ...)

  karyoplot$latest.plot <- list(funct="kpPlotBAMDensity", computed.values=list(density=dens, windows=windows, max.density=max(dens)))

  invisible(karyoplot)
}

You need a genome file containing the name, start and end positions of the chromosomes present in your genome e.g.: chr1H 1 516505933 chr2H 1 665585732 chr3H 1 621516507 chr4H 1 610333536 chr5H 1 588218687 chr6H 1 561794516 chr7H 1 632540562 chrUn 1 29110254

Then I run this code to create the image:

  k_genome <- read.table(GENOME_FILE, header=FALSE)
  max_chromosome <- max(k_genome$V3)
  bai_index_size <- 536870912
  custom_genome <- toGRanges(GENOME_FILE)

  pp <- getDefaultPlotParams(plot.type=1)
  pp$ideogramheight <- 0.1
  kp <- plotKaryotype(genome = custom_genome, cex=0.4, plot.params = pp)
  #Cannot use kpPlotBAMDensity with CSI index, must use custom plot using bedtools for density calculations
  if (max_chromosome > bai_index_size){
    bam.read <- kpPlotBAMDensity_custom(kp, data=BAM_FILE, col=COLOR, r0=0, r1=0.5, window.size=WINDOW_SIZE)
  } else {
    bam.read <- kpPlotBAMDensity(kp, data=BAM_FILE, col=COLOR, r0=0, r1=0.5, window.size=WINDOW_SIZE)
  }
  ymax.read <- bam.read$latest.plot$computed.values$max.density
  kpAxis(kp, ymax=ymax.read, numticks=2, cex=0.25, r0=0, r1=0.5, col=COLOR, side=2)
  kpAddBaseNumbers(kp, tick.dist=TICK_DISTANCE, add.units=TRUE, minor.ticks=TRUE, minor.tick.dist = MINOR_TICK_DISTANCE, cex=0.25, tick.len=10, minor.tick.len=4)
  kpAddMainTitle(kp, main=TITLE, cex=0.5)

I don't know you're trying to do with karyoplotR. But figure out which part uses Rsamtools, and try to find other tools that can generate the data you need to make a custom karyoplotR function.

Faewks commented 2 years ago

Thank you very much! This is exactly what I needed:)

I never thought about replacing Rsamtools.