Closed NJ2012 closed 9 months ago
Assuming that you have already defined the following objects: bedFile, bamFiles, sampleNames, outputDir Could you please, try to run the following lines one by one and see if/when you get error?:
`
library(Rsubread)
library(data.table)
library(parallel)
bed <- fread(bedFile)
df <- data.frame(cbind(1:nrow(bed), bed))
colnames(df) <- c("GeneID", "Chr", "Start", "End", "Strand")
if (!file.exists(outputDir)){dir.create(outputDir)}
i <- 1
file <- bamFiles[i]
prefix <- sampleNames[i]
outputFile <- paste0(outputDir, prefix, "_rpkm2.txt")
res <- featureCounts(files=file, annot.ext=df, allowMultiOverlap=TRUE, nthreads=3)
total <- sum(res$stat[2])
counts <- data.frame(res$counts)
colnames(counts) <- "count"
counts$RPKM <- as.numeric(1e9 * as.numeric(counts$count)) / as.numeric(as.numeric(df$End - df$Start) * as.numeric(total))
write.table(counts, file=outputFile, sep="\t", col.names=T, row.names=F, quote=F)`
Hi, Thank you for your help. I get the error with the following command:
bed <- fread(bedFile)
fread(bedFile) : Bumped column 1 to type character on data row 213577, field contains 'X'. Coercing previously read values in this column from logical, integer or numeric back to character which may not be lossless; e.g., if '00' and '000' occurred before they will now be just '0', and there may be inconsistencies with treatment of ',,' and ',NA,' too (if they occurred in this column before the bump). If this matters please rerun and set 'colClasses' to 'character' for this column. Please note that column type detection uses the first 5 rows, the middle 5 rows and the last 5 rows, so hopefully this message should be very rare. If reporting to datatable-help, please rerun and include the output from verbose=TRUE.
From: tgambin notifications@github.com Reply-To: BCM-Lupskilab/HMZDelFinder reply@reply.github.com Date: Tuesday, 21 March 2017 at 14:22 To: BCM-Lupskilab/HMZDelFinder HMZDelFinder@noreply.github.com Cc: "Lahrouchi, Najim" NLAHROUCHI@mgh.harvard.edu, Author author@noreply.github.com Subject: Re: [BCM-Lupskilab/HMZDelFinder] calcRPKMsFromBAMs() (#2)
i <- 1
file <- bamFiles[i]
prefix <- sampleNames[i]
outputFile <- paste0(outputDir, prefix, "_rpkm2.txt")
res <- featureCounts(files=file, annot.ext=df, allowMultiOverlap=TRUE, nthreads=3)
total <- sum(res$stat[2])
counts <- data.frame(res$counts)
colnames(counts) <- "count"
counts$RPKM <- as.numeric(1e9 as.numeric(counts$count)) / as.numeric(as.numeric(df$End - df$Start) as.numeric(total))
write.table(counts, file=outputFile, sep="\t", col.names=T, row.names=F, quote=F)`
The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Partners Compliance HelpLine at http://www.partners.org/complianceline . If the e-mail was sent to you in error but does not contain patient information, please contact the sender and properly dispose of the e-mail.
The example on Github work 1000G data work fine by the way.
From: tgambin notifications@github.com Reply-To: BCM-Lupskilab/HMZDelFinder reply@reply.github.com Date: Tuesday, 21 March 2017 at 14:22 To: BCM-Lupskilab/HMZDelFinder HMZDelFinder@noreply.github.com Cc: "Lahrouchi, Najim" NLAHROUCHI@mgh.harvard.edu, Author author@noreply.github.com Subject: Re: [BCM-Lupskilab/HMZDelFinder] calcRPKMsFromBAMs() (#2)
i <- 1
file <- bamFiles[i]
prefix <- sampleNames[i]
outputFile <- paste0(outputDir, prefix, "_rpkm2.txt")
res <- featureCounts(files=file, annot.ext=df, allowMultiOverlap=TRUE, nthreads=3)
total <- sum(res$stat[2])
counts <- data.frame(res$counts)
colnames(counts) <- "count"
counts$RPKM <- as.numeric(1e9 as.numeric(counts$count)) / as.numeric(as.numeric(df$End - df$Start) as.numeric(total))
write.table(counts, file=outputFile, sep="\t", col.names=T, row.names=F, quote=F)`
The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Partners Compliance HelpLine at http://www.partners.org/complianceline . If the e-mail was sent to you in error but does not contain patient information, please contact the sender and properly dispose of the e-mail.
This is just a warning and should not affect the results. Could you please try to execute next lines?
Hi there, This is regarding the calcRPKMsFromBAMs() function. I keep on getting the following error message : stack smashing detected : /opt/apps/r/3.3.3/lib/R/bin/exec/R terminated I have been struggling with this for days and I just can't seem to figure it out. I ran the code line by line and everything seems to be working fine before this part. The path to my bam files and bed file, I have checked everything. The required packages are loaded as well. I would really appreciate some help.
Thanks
Hi,
I tested HMZ, sadly with too few exome. I think I already seen this (last) error, but I can't remember how I solved it. @Rt-2017: 1/ could you post your commands ? 2/ example file works fine ?
Thx
Hey, Thanks for responding. I have a number of bam files and I need to calculate the rpkm. Yes the example works fine,but in the example rpkm is already calculated. Just have to download and run the code. For me, I need to do from the scratch,calculate the rpkm and then move forward. Unfortunately,I am stuck there :( . So I specified the path to my bam and bed files and then from the source code function calculate rpkm.
pathToBams <- '/path/to/bam/'
bamFiles <- paste0(pathToBams, dir(pathToBams, "bam$"))
pathToBed <- '/path/to/bed/'
bedFile <- paste0(pathToBed, dir(pathToBed, "bed$"))
rpkmDir <- dataDir
sampleNames <- sapply(strsplit(dir(pathToBams, "bam$"), "[/\.]"), function(x){x[length(x)-1]})
calcRPKMsFromBAMs(bedFile,bamFiles,sampleNames,rpkmDir,4)
I tried running line by line the calcRPKMsFromBAMs() function from the source code and as soon as I reach to this particular line,the program terminates.
res <- featureCounts(files=file, annot.ext=df, allowMultiOverlap=TRUE, nthreads=3)
I am guessing something to do with the data frame for the bed file but I am not sure. Any thoughts?
Thanks
Do you have Rsubread package installed? Also could you print few lines from df object?
Yes I have it installed. As for my bed file,the first column for the geneID is as follows
469_358463_79501(OR4F5)_1 469_360600_729759(OR4F29)_1 469_358708_81399(OR4F16)_1 473_403203_148398(SAMD11)_3 473_403204_148398(SAMD11)_4 473_403205_148398(SAMD11)_5 473_403206_148398(SAMD11)_6 473_403207_148398(SAMD11)_7 473_403208_148398(SAMD11)_8 473_403209_148398(SAMD11)_9 473_403210_148398(SAMD11)_10 473_403211_148398(SAMD11)_11 473_403212_148398(SAMD11)_12 473_403213_148398(SAMD11)_13 473_403214_148398(SAMD11)_14 469_357140_26155(NOC2L)_19
Could this be the problem?
I think the first column should be chromosome followed by start, stop. gene name should be in fourth column
colnames(df) <- c("GeneID", "Chr", "Start", "End", "Strand")
This is as per the source code,but I think the order does not matter based on what I read from here https://www.rdocumentation.org/packages/Rsubread/versions/1.22.2/topics/featureCounts
My worry is with the format for the GeneID, can it take both character strings and integers or just one?
Yes, sorry you are right. Could you try to replace this column with something else. Eg
df$GeneID <- 1:nrow (df)
Ok, so I add this extra bit in my code ? I am sorry,I am a little lost here :(
Yes please add this line before executing featureCounts function
I tried,still getting the same error :( stack smashing detected : /opt/apps/r/3.3.3/lib/R/bin/exec/R terminated Aborted
So as you said it looks the problem is with featureCounts. Maybe you can try to check if other people had similar error, see eg
https://www.biostars.org/p/168962/
It looks that some people had problems with bam headers. Also I don't remember now but index files could be required so make sure they are in the same folder as bam files.
Alternatively you can try to compute rpkm files using other software.
Yes the index files are required and I have them in the same folder. I guess it's the bed file anyways I will check the link. Thanks for your time :)
Hello I've been trying to run hmzdelfinder on our data and when I try to generate rpkm files I get the following error:
caught segfault address (nil), cause 'unknown'
Traceback: 1: .C("R_readSummary_wrapper", as.integer(n), as.character(cmd), PACKAGE = "Rsubread") 2: featureCounts(files = file, annot.ext = df, allowMultiOverlap = TRUE, nthreads = 3) 3: FUN(X[[i]], ...) 4: lapply(X = S, FUN = FUN, ...) 5: doTryCatch(return(expr), name, parentenv, handler) 6: tryCatchOne(expr, names, parentenv, handlers[[1L]]) 7: tryCatchList(expr, classes, parentenv, handlers) 8: tryCatch(expr, error = function(e) { call <- conditionCall(e) if (!is.null(call)) { if (identical(call[[1L]], quote(doTryCatch))) call <- sys.call(-4L) dcall <- deparse(call)[1L] prefix <- paste("Error in", dcall, ": ") LONG <- 75L msg <- conditionMessage(e) sm <- strsplit(msg, "\n")[[1L]] w <- 14L + nchar(dcall, type = "w") + nchar(sm[1L], type = "w") if (is.na(w)) w <- 14L + nchar(dcall, type = "b") + nchar(sm[1L], type = "b") if (w > LONG) prefix <- paste0(prefix, "\n ") } else prefix <- "Error : " msg <- paste0(prefix, conditionMessage(e), "\n") .Internal(seterrmessage(msg[1L])) if (!silent && identical(getOption("show.error.messages"), TRUE)) { cat(msg, file = stderr()) .Internal(printDeferredWarnings()) } invisible(structure(msg, class = "try-error", condition = e))}) 9: try(lapply(X = S, FUN = FUN, ...), silent = TRUE) 10: sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE)) 11: FUN(X[[i]], ...) 12: lapply(seq_len(cores), inner.do) 13: mclapply(1:length(bamFiles), function(i) { gc() gc() file <- bamFiles[i] prefix <- sampleNames[i] outputFile <- paste0(outputDir, prefix, "_rpkm2.txt") res <- featureCounts(files = file, annot.ext = df, allowMultiOverlap = TRUE, nthreads = 3) total <- sum(res$stat[2]) counts <- data.frame(res$counts) colnames(counts) <- "count" counts$RPKM <- as.numeric(1e+09 as.numeric(counts$count))/as.numeric(as.numeric(df$End - df$Start) as.numeric(total)) write.table(counts, file = outputFile, sep = "\t", col.names = T, row.names = F, quote = F)}, mc.cores = mc.cores) 14: calcRPKMsFromBAMs(bedFile, bamFiles, sampleNames, rpkmDir, 4)
Am I doing something wrong? How can I generate RPKMs?
It looks like the problem is in featureCounts function from Rsubread package. Could you please, provide first few lines of your bed file?
The bed file looks like this:
chr start stop name chr1 14361 14829 WASH7P chr1 14969 15038 WASH7P chr1 15795 15947 WASH7P chr1 16606 16765 WASH7P chr1 16857 17055 WASH7P chr1 17232 17368 WASH7P chr1 17605 17742 WASH7P chr1 17914 18061 WASH7P chr1 18267 18366 WASH7P
Please, remove header line and make sure the file is tab separated, and try to re-run the script.
Hi, Is it possible to provide a an example that runs fully on your exam data? I keep getting the same error: command: calcRPKMsFromBAMs(bedFile,bamFiles , sampleNames, rpkmDir,4)
In fread(bedFile) : Bumped column 1 to type character on data row 213577, field contains 'X'. Coercing previously read values in this column from logical, integer or numeric back to character which may not be lossless; e.g., if '00' and '000' occurred before they will now be just '0', and there may be inconsistencies with treatment of ',,' and ',NA,' too (if they occurred in this column before the bump). If this matters please rerun and set 'colClasses' to 'character' for this column. Please note that column type detection uses the first 5 rows, the middle 5 rows and the last 5 rows, so hopefully this message should be very rare. If reporting to datatable-help, please rerun and include the output from verbose=TRUE. 2: In mclapply(1:length(bamFiles), function(i) { : all scheduled cores encountered errors in user code
An full example would be helpful. Kind regards and thank you.
OK, I will prepare full working example that works on small BAM files. I will get back to you soon.
Hello, I've removed the header and I still get the same error. Is there another way to generate the rpkm files? Can I use conifer to generate it, because I can't get to work.
Please, see minimal working example for calcRPKMsFromBAMs():
https://gist.github.com/tgambin/f43937ce98a0300b11a15647e24f2318
I used toy BAM data available in WES.1KG.WUGSC package. As a next step I will integrate this example with the rest of the pipeline and then I will move it to the main repository.
Regarding other method to calculate RPKMs. Yes, you can use Conifer script, however you may get some additional false positive calls in small exons (Conifer counts only reads that start within the target region, so may report zero reads for exons that are actually well covered).
Hi there,
I have been trying to run this algorithm for some time,unfortunately I keep on getting a lot of errors.
I was not able to calculate the RPKM as I kept on getting the error message as stack smashing detected : /opt/apps/r/3.3.3/lib/R/bin/exec/R terminated.
I guess my bam files were quite big because it worked when I tried with one chr from a single file.
So I opted to calculated the RPKM with Conifer, but when I use this RPKM with HMZDelFinder, I get a new error:
Error in rownames<-
(*tmp*
, value = c("edit-1", "edit-10", "edit-11", :
attempt to set 'rownames' on an object with no dimensions
I double checked the format of the rpkm files,it has 2 columns with count and rpkm as per the example. I am not able to locate the error, is it the files or the values calculated by Conifer?
Could you send me few lines of your file RPKM file ?
So Conifer calculated this PTgerald_HFLK7BBXX_1_AACAACCG-AACAACCG_recal_FINAL.bam.rpkm.txt
And this is the edited one which I use edit-1.rpkm2.txt
At which stage you got this error ? i.e. is it after "Reading RPKM files ..."
yes ... "Removing empty elements ..." "Creating matrix ..." and then the error appears
Ok I think the problem is that second column in your txt file should be RPKM (not rpkm). Could you please check and let me know if this helps ?
yes.... :)...thank you so much.. I really feel dumb right now cuz of that error... :P
no problem :) hope it helps. Let me know if you were able to obtain final results.
Yeah...I got my results but I did it without AOH analysis...just wanted to make sure the pipeline runs without any errors before complicating it ...thanks alot for your help :)
Thanks for the update. Some of these results maybe already useful, i.e. hemizygous deletions in males and homozygous deletions with low z-RPKM score, but it is worth trying to add AOH filtering step.
Let me know if you need any help in the future.
@Rt-2017
Hi there, This is regarding the calcRPKMsFromBAMs() function. I keep on getting the following error message : stack smashing detected : /opt/apps/r/3.3.3/lib/R/bin/exec/R terminated I have been struggling with this for days and I just can't seem to figure it out. I ran the code line by line and everything seems to be working fine before this part. The path to my bam files and bed file, I have checked everything. The required packages are loaded as well. I would really appreciate some help.
Just to say:
I already seen this error ; and once again, few days ago, when I tested CNV detection in a new pool of data. I investigated a lot (I think), and it not come from the bed file: all previously OK bed file I tested cause bug. I seen this bug only on some bam files, always the same. I don't know why.
I upgrade Rsubread from 1.26 to 1.28 (last release), and it work !
Thanks , I will try it. I tried with my data a couple of months ago and I had around 300 bam files so I used conifer to calculate rpkm.
@tgambin
Am trying to calculate AOH, and the code for reading vcf is not working. I have one compressed vcf which has all the samples . Do I need to have individual files?
Yes, the current code requires VCFs for single samples
Dear sir, madam, I keep running into the following problem. It is not to me clear what this means and how to resolve. I hope you can help me further.
command: calcRPKMsFromBAMs(bedFile,bamFiles , sampleNames, rpkmDir,4)
In fread(bedFile) : Bumped column 1 to type character on data row 213577, field contains 'X'. Coercing previously read values in this column from logical, integer or numeric back to character which may not be lossless; e.g., if '00' and '000' occurred before they will now be just '0', and there may be inconsistencies with treatment of ',,' and ',NA,' too (if they occurred in this column before the bump). If this matters please rerun and set 'colClasses' to 'character' for this column. Please note that column type detection uses the first 5 rows, the middle 5 rows and the last 5 rows, so hopefully this message should be very rare. If reporting to datatable-help, please rerun and include the output from verbose=TRUE. 2: In mclapply(1:length(bamFiles), function(i) { : all scheduled cores encountered errors in user code