Open Sfeng666 opened 10 months ago
call-1 works like call-2 when it comes to library type and strand information. The only difference is that an in silico sample is simulated from the existing read information by substituting all non-reference base calls with the reference base.
Hi Michael & fellow users,
I have a few questions about the strand information in JACUSA 2 output, and the proper usage of that information for RNA editing identification & frequency estimation:
In the variant detection scenario
call-1
and given an RNA sample sequenced on the first strand, using-P RF-FIRSTSTRAND
will allow some sites to have two lines of output, each for a strand, like this bed-like out output:Above is a simple case of non-editing sites, where the ref allele is simply inverted, but the allele counts (here equal to depth) are not the same (but similar) for inverted bases. How to interpret this? My guess is that there are some reads mapped to both strands, contributing to allele counts on both strands, while reads that mapped specifically to either strands contribute to the difference in allele counts. Can you confirm on this?
In a more complicated case, where RNA editing is detected on both strands, I feel uncertain about how to interpret this result. I wonder if it's ok to interpret this as true RNA editing events on both strands, or if we are more confident about editing called on one of the strands, based on certain statistics (e.g., Z score)?
If my guess about the first example is correct, it could be possible that the 3 reads covering
G
on the positive strand are contributed by the same reads that also mapped toC
on the negative strand (base order : A, C, G, T). If unfortunately those reads were from negative strand-sourced transcripts, then RNA editing occurred at the genomic location on negative strand.The problem is: we won't know what proportion of allele counts are actually contributed by the same reads, nor which strand of transcripts were those reads from.
I could think of the following options to determine which strand an editing is more likely to occurred at, when JACUSA2 detected candidate editing on both strands of a given site:
call-1
).Since the proportion of overlapping genes on opposite strands is generally low (an average of 3.3% of overlapping exons on opposite strands in human chromosome), I think it is fine to either discard these sites, or pick one strand and give up potential information on the other. What do you think?
I know this is a long discussion, but I would appreciate a lot if you could clarify my questions, and/or offer your insights in how to solve the dual-strand problem.