Closed ashakru closed 4 years ago
Hm, computeFeatures is made to be able to get both in the same run, so let me check that nothing strange is going on, else I need you to give me some more info on your data structure. Will get back to you in a few hours :)
First off: if you get this warning, then I know the answer why it is happening:
Warning:
All widths are 1, If ribo-seq is p-shifted, score or size meta column should contain widths of read, will continue using 1-widths
If not, here are some thoughts: Well, I checked through the function and there is no bugs it seems. So can you show me how the riboBamGR object looks like ?
There are also 2 problems here:
you use qwidth(riboBam), this is not totally safe, I made a function in ORFik that takes care of soft / hard clipping issues, so instead of qwidth() use readWidths(), double check that it gives the same answer.
Why do you do custom p-shifting, you did not use detectRibosomeShifts() to find p-shifts ? If you have the shifts, the function shiftFootprints() does this for you. shiftFootprints() does all your code in one line, so why not use it ?
And why are you shifting all your reads by 12 ? Do you only have one read length ?
Let me know what you think.
I can't see such warning. Addressing your comments:
GRanges object with 3795776 ranges and 1 metadata column:
seqnames ranges strand | score
<Rle> <IRanges> <Rle> | <integer>
[1] chr1 629923 + | 29
[2] chr1 629924 + | 30
[3] chr1 629924 + | 29
[4] chr1 629937 + | 30
[5] chr1 629937 + | 29
Update: I've managed to run computeFeatures
, using the following code. Basically, I've replaced offsets computed by ORFik with those I've got from plastid and it worked like a dream.
# Load files
rnabam <- readBam("riboseq.bam")
ribobam <- readBam("rnaseq.bam")
# Define offsets
plastidOut <- read_delim("plastid.out", delim = "\t", comment = "#") %>%
filter(length %in% 25:31)
offsets <- data.frame(fraction = plastidOut$length,
offsets_start = -1*plastidOut$p_offset,
stringsAsFactors = F)
# Input for computeFeatures
psites <- shiftFootprints(ribobam, offsets)
Now the input looks like this and FLOSS score is not 0 anymore.
GRanges object with 1110784 ranges and 1 metadata column:
seqnames ranges strand | size
<Rle> <IRanges> <Rle> | <integer>
[1] chr1 138760 + | 29
[2] chr1 138760 + | 29
[3] chr1 629924 + | 31
[4] chr1 629926 + | 26
[5] chr1 629937 + | 30
It is interesting, because according to the manual read length information in score
should have worked as well.
Ok, thanks, I will test it today, looks like something is going on with how floss pick read length column.
BTW, which ORFik and R version are you running ? Just send me: sessionInfo()
I see the documentation is a bit unclear, will update it by tomorrow for devel version.
So conclusion, I have update documentation to make this function more clear. It should now work as you wanted in devel branch, but I recommend using a $size column instead of $score.
I will close the issue in 2 days if you don't have any more questions.
Thank you for creating such a wonderful tool. It already has helped me a lot! I try to use ORFik to score novel ORF identified from Ribo-Seq data and as reference I used
computeFeatures
function on annotated CDS regions. I was surprised to see that most of annotated CDS had negative ORF score. I modified Ribo-Seq alignment so that each read represents now estimated P-site location. Information about read length was moved to score (similar approach was shown in ORFik vignette, if I'm not mistaken):Then
riboBamGR
was used as input forcomputeFeatures
.Now, ORFscore has increased and I have 20 808 CDSs with positive ORFscore vs. 17 037 CDSs before P-site adjustment. However FLOSS score is now 0 for all of the CDSs. Do you think that P-site adjustment is essential for ORFik to compute a valid ORFscore? If so, what is in your opinion a good way of doing that to compute FLOSS score as well in one run?