Open lundlab opened 1 week ago
Hey, what is R and ORFik version?
R is 4.4.0 ORFik is 1.26.0
Try this:
remotes::install_github("Roleren/ORFik")
Then you must restart R to reload package set Then retry your code
If it does not work send me this: seqinfo(df) df is the ORFik experiment I suspect you have non matching gtf/gff to bam file
Thanks for the help, I'll send the seqinfo(df) later.
I installed your GitHub ORFik version and restarted as you suggested.
You were right about the GTF, my colleague had modified the standard Gencode mouse version and used that for alignment in STAR.
But with the same fasta & GTF used in STAR, I get a different error:
shiftFootprintsByExperiment(df[df$libtype == "RFP",]) Shifting reads in experiment: SNORD102_RPF Saving ofst files to: /Users/svd569/Desktop/STAR_outputs/Bam_files_genome/pshifted/ Saving wig files to: /Users/svd569/Desktop/STAR_outputs/Bam_files_genome/pshifted/ Warning in call_fun_in_txdbmaker("makeTxDbFromGFF", ...) : makeTxDbFromGFF() has moved to the txdbmaker package. Please call txdbmaker::makeTxDbFromGFF() to get rid of this warning. Import genomic features from the file as a GRanges object ... OK Prepare the 'metadata' data frame ... OK Make the TxDb object ... Warning in .get_cds_IDX(mcols0$type, mcols0$phase) : The "phase" metadata column contains non-NA values for features of type stop_codon. This information was ignored. OK /Users/svd569/Desktop/STAR_outputs/Bam_files_genome/Empty2_mergedfile_w_umi_RPF_clean_RPF_star_alignedAligned.sortedByCoord.out.bam Warning in call_fun_in_txdbmaker("makeTxDbFromGFF", ...) : makeTxDbFromGFF() has moved to the txdbmaker package. Please call txdbmaker::makeTxDbFromGFF() to get rid of this warning. Import genomic features from the file as a GRanges object ... OK Prepare the 'metadata' data frame ... OK Make the TxDb object ... Warning in .get_cds_IDX(mcols0$type, mcols0$phase) : The "phase" metadata column contains non-NA values for features of type stop_codon. This information was ignored. OK
/Users/svd569/Desktop/STAR_outputs/Bam_files_genome/Empty1_mergedfile_w_umi_RPF_clean_RPF_star_alignedAligned.sortedByCoord.out.bam Warning in call_fun_in_txdbmaker("makeTxDbFromGFF", ...) : makeTxDbFromGFF() has moved to the txdbmaker package. Please call txdbmaker::makeTxDbFromGFF() to get rid of this warning. Import genomic features from the file as a GRanges object ... OK Prepare the 'metadata' data frame ... OK Make the TxDb object ... Warning in .get_cds_IDX(mcols0$type, mcols0$phase) : The "phase" metadata column contains non-NA values for features of type stop_codon. This information was ignored. OK
Stop worker failed with the error: wrong args for environment subassignment Error: BiocParallel errors 2 remote errors, element index: 1, 2 4 unevaluated and other errors first remote error: Error in fold(cvg, circle.length): 'circle.length' must be positive
Hm, fold is not an ORFik function as far as I remember. Can you do in R:
?fold
Then update that package, if it is already updated then find GitHub repo and install like you did with ORFik, if the bug is still there I can tell you how to install older package version (without the bug in it).
The bug is about coverage on circular chromosomes it looks like. There is a small chance something is still wrong with gtf, but let's test this first.
?fold gives:
S4Vectors internals {S4Vectors} R Documentation S4Vectors internals Description Objects, classes and methods defined in the S4Vectors package that are not intended to be used directly.
[Package S4Vectors version 0.44.0 Index]
I tried to remove the circular chromosome from the txdb and df with: txdb <- dropSeqlevels(txdb, "chrM", pruning.mode=c("tidy")) df <- dropSeqlevels(df, "chrM", pruning.mode=c("tidy"))
then update the template <- create.experiment, and try shiftFootprintsByExperiment again, but got another unspecified error:
shiftFootprintsByExperiment(df[df$libtype == "RFP",]) Shifting reads in experiment: SNORD102_RPF Saving ofst files to: /Users/svd569/Desktop/STAR_outputs/Bam_files_genome/pshifted/ Saving wig files to: /Users/svd569/Desktop/STAR_outputs/Bam_files_genome/pshifted/ Warning in call_fun_in_txdbmaker("makeTxDbFromGFF", ...) : makeTxDbFromGFF() has moved to the txdbmaker package. Please call txdbmaker::makeTxDbFromGFF() to get rid of this warning. Import genomic features from the file as a GRanges object ... OK Prepare the 'metadata' data frame ... OK Make the TxDb object ... Warning in .get_cds_IDX(mcols0$type, mcols0$phase) : The "phase" metadata column contains non-NA values for features of type stop_codon. This information was ignored. OK /Users/svd569/Desktop/STAR_outputs/Bam_files_genome/SNORD102OE3_mergedfile_w_umi_RPF_clean_RPF_star_alignedAligned.sortedByCoord.out.bam Warning in call_fun_in_txdbmaker("makeTxDbFromGFF", ...) : makeTxDbFromGFF() has moved to the txdbmaker package. Please call txdbmaker::makeTxDbFromGFF() to get rid of this warning. Import genomic features from the file as a GRanges object ... OK Prepare the 'metadata' data frame ... OK Make the TxDb object ... Warning in .get_cds_IDX(mcols0$type, mcols0$phase) : The "phase" metadata column contains non-NA values for features of type stop_codon. This information was ignored. OK
Stop worker failed with the error: wrong args for environment subassignment Error: BiocParallel errors 0 remote errors, element index: 6 unevaluated and other errors first remote error:
That said nothing, the error should be after last line "first remote error:"
So lets do the test outside the bplapply, do this:
# Load first ribo-seq lib
rfp_path <- filepath(df[1,], "ofst")
print(rfp_path)
rfp <- fimport(rfp_path)
print(seqinfo(rfp))
# Load txdb
txdb <- loadTxdb(df)
print(seqinfo(txdb))
print(seqinfo(df))
detectRibosomeShifts(rfp, txdb, verbose = TRUE) # <- This will fail I guess with a better error
Ok I did that (with the mitochondrial chromosome still included) and got this:
rfp_path <- filepath(df[1,], "ofst") print(rfp_path) [1] "/Users/svd569/Desktop/STAR_outputs/Bam_files_genome/Empty1_mergedfile_w_umi_RPF_clean_RPF_star_alignedAligned.sortedByCoord.out.bam"
rfp <- fimport(rfp_path) print(seqinfo(rfp)) Seqinfo object with 61 sequences from an unspecified genome: NA
Load txdb
txdb <- loadTxdb(df) Import genomic features from the file as a GRanges object ... OK Prepare the 'metadata' data frame ... OK Make the TxDb object ... Warning in .get_cds_IDX(mcols0$type, mcols0$phase) : The "phase" metadata column contains non-NA values for features of type stop_codon. This information was ignored. OK print(seqinfo(txdb)) Seqinfo object with 22 sequences (1 circular) from an unspecified genome; no seqlengths: NA
print(seqinfo(df)) Seqinfo object with 61 sequences from an unspecified genome: NA
detectRibosomeShifts(rfp, txdb, verbose = TRUE)
Number of CDSs used for p-site detection: 18032
Distribution of read lengths with reads count > min_reads:
Error in fold(cvg, circle.length) : 'circle.length' must be positive
Yes, so your txdb is malformed.
I think the df@txdb for you specifies a gtf path and not a txdb path ?
This is a bit sketchy if you have nonstandard STAR alignment. What is safest is always to use a predefined txdb like I do in the tutorial, so do this:
# Make a safe txdb, in ORFIk:::getGenomeAndAnnotation this is done for you:
# Given that your df@txdb is a .gtf path and not .gtf.db (txdb) path
gtf_path <- df@txdb
genome_fasta <- df@fafile
organism <- organism(df) # Mus musculus if not defined
makeTxdbFromGenome(gtf_path, genome_fasta, organism, optimize = TRUE)
#Now recreate your experiment, with txdb input:
create.experiment(..., txdb = paste0(gtf_path, ".db"), …)
df <- read.experiment(...) # Load new experiment
#Now run the detectRibosomeShifts call again like this:
rfp_path <- filepath(df[1,], "ofst") # Ignore reload if file is large bam and not ofst
rfp <- fimport(rfp_path)
# Load txdb
txdb <- loadTxdb(df)
stopifnot(identical(seqinfo(txdb), seqinfo(df)))
detectRibosomeShifts(rfp, txdb, verbose = TRUE) # <- This should now work
A note for speed, I see you have bam files. ORFik has many super optimized formats, so I always run: convert_bam_to_ofst(df) after STAR alignment, to get much faster version of bam file. This speeds up shifting later on etc.
Downstream you have other formats like covRle which are insanely fast for plotting and coverage calculations etc. If this is just a single run test, you can just use the file formats you have, as formats are mostly important when you reload the same file multiple times (for different analysis etc).
I'll try to make a safe txdb again, I did try that before, but got an error about NNNNNNN in the fasta file.
Actually detectRibosomeShifts seems to work now with the code you gave previously, as long as I take out the ChrM. Can't seem to generate the plots yet though with shiftPlots.
Change point analysis (start): ups (upstream window score), downs (downstream window score) Info / Read length: 26 Possible change points of the max frame with sliding windows: Info / Read length: 28 Possible change points of the max frame with sliding windows: Info / Read length: 29 Possible change points of the max frame with sliding windows: Info / Read length: 31 Possible change points of the max frame with sliding windows: Info / Read length: 32 Possible change points of the max frame with sliding windows: Info / Read length: 34 Possible change points of the max frame with sliding windows:
So you ran the shiftFootprintsByExperiment step and have the shifted files.
And now you loaded that and tried to run shiftPlots and it fails ?
No it doesn't seem the pshifted files were written. I just got a gtable with some grobs in it, which I was able to convert into a ggplot object with as.ggplot from ggplotify library.. Not sure if the plots are correct, I'll attach the file here so you can take a look. PeriodicityEmpty&_SNORD102OE.pdf
I'll try now to make the safe txdb and start agin from the beginning. Will let you know how it goes.
When you have pshifted, check this:
filepath(df, "pshifted")
It should link to ofst files in the ./pshifted/ folder relative to your bam files, if it does not it means it failed.
Also, if you need help with the genome (the NNNNNNN), send me the error. N in the genome fasta should not matter as far as I know. As most chromosome has NNNNNNN flanks.
Any updates ? :)
Hi Håkon,
I work in Anders Lund's lab at BRIC in Copenhagen. I’m trying to use your ORFik pipeline, mainly to do QC on a recent RIbo-seq experiment performed on mouse organoids. But maybe also fro some other stuff afterwards. Basically I first want to check that my RPFs have 3nt periodicity before I constrict and sequence the parallel RNA-seq libraries.
Everything seems to be fine until I get to the “shiftFootprintsByExperiment” step. Then I get some errors, for example: "Error in covRleFromGR(x, weight = weight, ignore.strand = ignore.strand): Seqlengths of x contains NA values!”
Could you help me solve this?
I see someone else had a similar problem on Biostars, but the solution is not present in the posts. https://www.biostars.org/p/9552773/
Thanks, Martin Jansson