Closed pblaney closed 4 years ago
Hi @pblaney , can you share your R object with me akdess [at] gmail. com ? I will be more helpful if I can see the data. Thanks!
Hi @pblaney,
object@loh.name.mapping must have the following column names: sample.name and loh.name
and hg38 cytoband should be generating using the following commands; http://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/cytoBand.txt.gz cytoband <- read.delim("cytoBand.txt", header=F) cytoband$V4 <-substring(cytoband$V4, 1, 1) start <- do.call(rbind, lapply(split(cytoband$V2, paste0(cytoband$V1, cytoband$V4)), min)) end <- do.call(rbind, lapply(split(cytoband$V3, paste0(cytoband$V1, cytoband$V4)), max)) cytoband <- data.frame(V1=gsub("p", "", gsub("q", "", rownames(start))), V2=start, V3=end, V4=rownames(start), stringsAsFactors=F) cytoband$V4[grep("q", cytoband$V4)] <- "q" cytoband$V4[grep("p", cytoband$V4)] <- "p" cytoband <- cytoband [ as.vector(sapply(c(1:22, "X"), function(x) which(cytoband$V1 %in% x))), ] rownames(cytoband) <- NULL
Hello,
I'm still experiencing issues when trying to run runCaSpER(), and looking at this issue I think my problem may be with the cytoband input into CreateCasperObject(). I can't find any documentation as to what the expected input for cytoband is, and your code for the creation of a cytoband (above, and in the Vignette) throws an error for me. Could you provide a cytoband object or state the format of what the cytoband object should be?
Thank you
I am sorry for the error. I have updated the tutorial.
I have included the data object in https://github.com/akdess/CaSpER/blob/master/data/hg38_cytoband.rda
And I used below code to generate cytoband data:
http://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/cytoBand.txt.gz cytoband <- read.delim("cytoBand.txt", header=F) cytoband <- data.frame(V1=gsub("chr", "", cytoband[,1]), V2=cytoband[,2], V3=cytoband[,3], V4=substring(cytoband$V4, 1, 1), stringsAsFactors=F) start <- do.call(rbind, lapply(split(cytoband$V2, paste0(cytoband$V1, cytoband$V4)), min)) end <- do.call(rbind, lapply(split(cytoband$V3, paste0(cytoband$V1, cytoband$V4)), max)) cytoband <- data.frame(V1=gsub("p", "", gsub("q", "", rownames(start))), V2=start, V3=end, V4=rownames(start), stringsAsFactors=F) cytoband <- cytoband [as.vector(unlist(sapply(c(1:22, "X"), function(x) which(cytoband$V1 %in% x)))), ] cytoband$V4[grep("q", cytoband$V4)] <- "q" cytoband$V4[grep("p", cytoband$V4)] <- "p" rownames(cytoband) <- NULL
Thank you for the reply!
Unfortunately the cytoband data was not what was causing my issue, and I'm still stuck on this. I will email over a random sample of my object, and if you could take a look at this I would greatly appreciate it.
Need help! Error with running runCaSpER:
object <- CreateCasperObject(raw.data=m.rpkm, sequencing.type="bulk", loh.name.mapping = loh.name.mapping,cnv.scale=3, loh.scale=3, control.sample.ids = c("n1","n2","n3"),annotation=annotation, method="iterative", loh=loh,cytoband=centromere, genomeVersion = "hg38") Performing Median Filtering... Scale:1... Performing Median Filtering... Scale:2... Performing Median Filtering... Scale:3... There were 24 warnings (use warnings() to see them)
final.objects <- runCaSpER(object, removeCentromere=T, cytoband=centromere, method="iterative") Performing recursive median filtering... Performing HMM segmentation... Error in DataFrame(..., check.names = FALSE) : different row counts implied by arguments In addition: There were 50 or more warnings (use warnings() to see the first 50)
Warnings look like- In runmed(x, filt$n) : 'k' is bigger than 'n'! Changing 'k' to 43
I fixed the above. My annotation file had different dimension from my gene matrix. Now, I have a new error:
final.objects <- runCaSpER(object, removeCentromere=T, cytoband=cytoband, method="iterative") Performing recursive median filtering... Performing HMM segmentation... Processing cnv.scale:1 loh.scale:1... Error in value[[jvseq[[jjj]]]] : subscript out of bounds
Hi @mijhamilt thanks a lot for your input!
The error might be because you should do: names(loh) <- gsub(".baf", "", names(loh))
or because:
the "loh" list structure should only contain the samples in your loh.name.mapping. All the names(loh) should be in loh.name.mapping$loh.name.
Hope it helps. I will update the code and tutorial to make it clear. Thanks a lot!
Thank you for your help! It is working 🙂
Three question:
thanks, Michael
From: Akdes Serin Harmanci notifications@github.com Sent: Thursday, April 23, 2020 9:41 PM To: akdess/CaSpER CaSpER@noreply.github.com Cc: Hamilton, Michael mihamilton@health.ucsd.edu; Mention mention@noreply.github.com Subject: Re: [akdess/CaSpER] Troubleshooting Error with runCaSpER (#12)
Hi @mijhamilthttps://github.com/mijhamilt thanks a lot for your input!
The error might be because you should do: names(loh) <- gsub(".baf", "", names(loh))
or because:
the "loh" list structure should only contain the samples in your loh.name.mapping. All the names(loh) should be in loh.name.mapping$loh.name.
Hope it helps. I will update the code and tutorial to make it clear. Thanks a lot!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/akdess/CaSpER/issues/12#issuecomment-618797943, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AN2GK5VCKRISRSDA5HNA4W3ROEKATANCNFSM4KUEAHWQ.
Hi @mijhamilt,
I am sorry but I can not see the attachment.
There was a typo in the tcga_BRCA.R demo code it should be rna. I will correct the demo R script. Thanks for letting me know. In meningioma demo R code I believe it is correct:
library(GenomicRanges) gamma <- 7 all.segments <- do.call(rbind, lapply(final.objects, function(x) x@segments)) segment.summary <- extractSegmentSummary (final.objects) loss <- segment.summary$all.summary.loss gain <- segment.summary$all.summary.gain loh <- segment.summary$all.summary.loh
loss.final <- loss[loss$count>=gamma, ] gain.final <- gain[gain$count>=gamma, ] loh.final <- loh[loh$count>=gamma, ]
all.summary<- rbind(loss.final, gain.final)
colnames(all.summary) [2:4] <- c("Chromosome", "Start", "End")
rna <- GRanges(seqnames = Rle(gsub("q", "", gsub("p", "", all.summary$Chromosome))),
IRanges(all.summary$Start, all.summary$End))
ann.gr <- makeGRangesFromDataFrame(final.objects[[1]]@annotation.filt, keep.extra.columns = TRUE, seqnames.field="Chr")
hits <- findOverlaps(rna, ann.gr)
genes <- splitByOverlap(ann.gr, rna, "GeneSymbol")
genes.ann <- lapply(genes, function(x) x[!(x=="")])
all.genes <- unique(final.objects[[1]]@annotation.filt[,2])
all.samples <- unique(as.character(final.objects[[1]]@segments$ID))
rna.matrix <- gene.matrix(seg=all.summary, all.genes=all.genes, all.samples=all.samples, genes.ann=genes.ann)
I am not sure about the NAs in plotMUAndCooccurence code. If you want you can send me your R data object by email: akdess [at] gmail com
Hope it helps.
Thank you! I caught the the type also 🙂
Quick question......I need to do this pipeline for mouse. Would you have the mm10 generate genome_fasta_pileup_dir files and genome_list files. If not, is there a easy to generate them?
thank you, Michael
From: Akdes Serin Harmanci notifications@github.com Sent: Sunday, April 26, 2020 8:20 PM To: akdess/CaSpER CaSpER@noreply.github.com Cc: Hamilton, Michael mihamilton@health.ucsd.edu; Mention mention@noreply.github.com Subject: Re: [akdess/CaSpER] Troubleshooting Error with runCaSpER (#12)
Hi @mijhamilthttps://github.com/mijhamilt,
I am sorry but I can not see the attachment.
I think there was a typo in the code it should be rna. I will correct the demo R script. Thanks for letting me know.
I am not sure about the NAs in plotMUAndCooccurence code. If you want you can send me your R data object by email: akdess [at] gmail com
Hope it helps.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/akdess/CaSpER/issues/12#issuecomment-619689344, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AN2GK5RN7CZUL56OSWDB2YLROT2YRANCNFSM4KUEAHWQ.
Hi @mijhamilt, I have already created those files it should be in https://www.dropbox.com/s/z7lq8pint83o2dw/mm10.zip?dl=0
For mm10 I should make some changes to Casper code. If you send me an email akdess at gmail com I can send you the modified version of Casper for mm10. We will try to update Casper soon to be compatible with mm10. Thanks a lot!
For my code here:
library(data.table) library(CaSpER) library(edgeR)
sub <- read.csv('./cod_length.csv', row.names = 1) dat <- fread("SL1to2_raw.csv") dat$GeneID <- sub("\..*", "", dat$GeneID) sub <- sub <- sub[sub$ensm %in% dat$GeneID, ]
cytoband <- cytoband_hg38
rannotation <- generateAnnotation(id_type = "ensembl_gene_id", genes = dat$GeneID, ishg19 = FALSE, centromere)
loh_genes <- unique(rannotation$Gene) # Change 'id' if needed based on the structure of rannotation
common_genes <- intersect(intersect(dat$GeneID, sub$ensm), loh_genes)
dat_filtered <- dat[dat$GeneID %in% common_genes, ]
sub
) to retain only the common genessub_filtered <- sub[sub$ensm %in% common_genes, ]
rannotation_filtered <- rannotation[rannotation$Gene %in% common_genes, ] # Adjust field name based on rannotation
structure
cat("Number of common genes: ", length(common_genes), "\n")
paths <- c("./SL1/", "./SL2/") folder_names <- basename(paths) folder_names <- gsub("/$", "", folder_names)
read_and_name <- function(path) { result <- readBAFExtractOutput(path = path, sequencing.type = "bulk") unlisted_result <- result[[1]] return(unlisted_result) }
loh <- lapply(paths, read_and_name) names(loh) <- gsub(".baf", "", names(loh))
control.sample.ids = "PDA75"
loh.name.mapping <- data.frame(loh.name = folder_names, sample.name = folder_names)
dat_matrix <- as.matrix(data.frame(dat_filtered[!duplicated(dat_filtered$GeneID),], row.names = 1, check.names = F))
dat_matrix <- rpkm(dat_matrix, gene.length = sub_filtered$Length, log=T)
object <- CreateCasperObject(raw.data=dat_matrix, loh.name.mapping=loh.name.mapping, sequencing.type="bulk",
cnv.scale=3, loh.scale=3, matrix.type="normalized", expr.cutoff=1,
annotation=rannotation_filtered, method="iterative", loh=loh, filter="median",
control.sample.ids=control.sample.ids, cytoband=centromere, genomeVersion="hg38", log.transformed = T)
final.objects <- runCaSpER(object, removeCentromere=T, cytoband=cytoband, method="iterative")
finalChrMat <- extractLargeScaleEvents (final.objects, thr=0.75)
gamma <- 6 all.segments <- do.call(rbind, lapply(final.objects, function(x) x@segments)) segment.summary <- extractSegmentSummary (final.objects) loss <- segment.summary$all.summary.loss gain <- segment.summary$all.summary.gain loh <- segment.summary$all.summary.loh loss.final <- loss[loss$count>gamma, ] gain.final <- gain[gain$count>gamma, ] loh.final <- loh[loh$count>gamma, ]
all.summary<- rbind(loss.final, gain.final)
colnames(all.summary) [2:4] <- c("Chromosome", "Start", "End")
rna <- GRanges(seqnames = Rle(gsub("q", "", gsub("p", "", all.summary$Chromosome))),
IRanges(all.summary$Start, all.summary$End))
ann.gr <- makeGRangesFromDataFrame(final.objects[[1]]@annotation.filt, keep.extra.columns = TRUE, seqnames.field="Chr")
hits <- findOverlaps(geno.rna, ann.gr)
genes <- splitByOverlap(ann.gr, geno.rna, "GeneSymbol")
genes.ann <- lapply(genes, function(x) x[!(x=="")])
all.genes <- unique(final.objects[[1]]@annotation.filt[,2])
all.samples <- unique(as.character(final.objects[[1]]@segments$ID))
rna.matrix <- gene.matrix(seg=all.summary, all.genes=all.genes, all.samples=all.samples, genes.ann=genes.ann)
obj <- final.objects[[9]] plotHeatmap(object=obj, fileName="heatmap.png",cnv.scale= 3, cluster_cols = F, cluster_rows = T, show_rownames = T, only_soi = T)
plotLargeScaleEvent(object=obj, fileName="large.scale.events.png")
plotGEAndGT(chrMat=finalChrMat, genoMat=genoMat, fileName="RNASeqAndGT.png")
plotBAFAllSamples (loh = obj@loh.median.filtered.data, fileName="LOHAllSamples.png")
plotBAFOneSample (object, fileName="LOHPlotsAllScales.pdf")
plotBAFInSeperatePages (loh=obj@loh.median.filtered.data, folderName="LOHPlots")
plotGEAndBAFOneSample (object=obj, cnv.scale=3, loh.scale=3, sample= "MN-5")
plotSingleCellLargeScaleEventHeatmap(finalChrMat, sampleName="MGH31", chrs=c("5p", "14q"))
results <- extractMUAndCooccurence (finalChrMat, loh, loh.name.mapping)
plotMUAndCooccurence (results)
It is showing this error. Please help:
final.objects <- runCaSpER(object, removeCentromere=T, cytoband=cytoband, method="iterative") Performing recursive median filtering... Performing HMM segmentation... Processing cnv.scale:1 loh.scale:1... Error in value[[jvseq[[jjj]]]] : subscript out of bounds
Hi Akdes,
I have been working my way through the pipeline and have run into an issue that I have been trying to work through the past few days with no luck. I have no issue with all the preceding commands up to the
runCaSpER()
call:Which yields the following error:
Error in if (object@segments$size[i] > pair_arm_sizes * (1/3)) object@segments$event_scale[i] <- "large_scale" : missing value where TRUE/FALSE needed
Based on the error, one or both of the values is NA but as I've worked back through the
PerformSegmentationWithHMM()
function I cannot find a reason why.To ensure it wasn't a dependency issue, I ran the meningioma demo and had no issue. I then went through each portion of my data to make sure the classes lined up with the examples, especially the cytoband information, but still resulted in the same error.
Any suggestions? Patrick