Open iraiosub opened 2 months ago
oh, actually I do not have access to contribute, so here's a proposed quick fix - to be used inside of the psite
function:
tempoff <- function(v_dist){
ttable <- sort(table(v_dist), decreasing = T)
# Create right-shifted table with zero filling for non-existing positions
right_indices <- as.character(as.numeric(names(ttable)) + 1)
ttable_sr <- ttable[right_indices]
ttable_sr[is.na(ttable_sr)] <- 0 # Replace NA with 0
# Create left-shifted table with zero filling for non-existing positions
left_indices <- as.character(as.numeric(names(ttable)) - 1)
ttable_sl <- ttable[left_indices]
ttable_sl[is.na(ttable_sl)] <- 0 # Replace NA with 0
tsel <- rowSums(cbind(ttable > ttable_sr, ttable > ttable_sl), na.rm = T)
return(as.numeric(names(tsel[tsel == 2][1])))
}
Ho there, thank you for using riboWaltz. You are perfectly right, in some specific cases the tool doesn't work. Lately several issues related to low input data popped out, which makes sense since riboWaltz has been designed for standard RiboSeq with "enough" material.
Thank you so much for bothering to inspect the code and propose an alternative. I will implemented the solution you kindly provided as soon as I'm free from other, urgent duties.
Best Fabio
thank you very much Fabio! That's really good to know.
Cheers, Ira
hey! thanks for developing this fantastic tool! I've been using it with great results and included it in a number of pipelines, with plans to include it the the
nf-core/riboseq
pipeline as a P-site identification+diagnostics option. While working with some test data (smaller size than any typical ribo-seq dataset), I noticed in cases where there are missing read extremities at certain positions relative to the start codon, results in mis-identifying what should be the correct temporary offset. E.g. visually the temporary offset clearly should be at the -12 position as seen in the offset plot below (also cross-validated by other tools such as ribotish), but is assigned by thepsite
function as -17.28.pdf
I have tracked the issue and identified it originates in the
tempoff
function, which introduces NAs if there are no read ends at the adjacent positions, here -13 and -11 for the offset of -12 which is the most frequent in the data, causing the -12 to be dropped and no longer a valid offset option.This occurs with both versions
1.2.0
and2.0
. I am attaching the files needed to reproduce the issue.test_data.zip
I will shortly follow with a PR with a proposed solution, if you would like to review it and make adjustments as needed.
I appreciate this might not be an issue with most full-sized datasets, but it would be good to solve this and make a patch release here and on bioconda, as this is required for using it as part of reproducible workflows - the community would benefit greatly from it :)