CraigIDent / SpliSER

Bioinformatic tool for Splice site Strength Estimation using RNA-seq
15 stars 5 forks source link

Duplicates Site #4

Closed algaebrown closed 1 month ago

algaebrown commented 11 months ago

Hi @CraigIDent, Thanks for making such a wonderful tool.

I ran it on some stranded RNA-seq data and found duplicated acceptor SSE values: image

Please advise why there would be duplicated values for a single position on the same gene? Thanks

I ran SpliSER_v0_1_8.py. Thanks

CraigIDent commented 7 months ago

Hi @algaebrown! Sorry I missed this one. Short answer is - it shouldn't be doing that. I guess you might have moved on from this analysis, but if you could send me an extract of the BAM and BED file that you were using for this region, it would help me figure out what has gone wrong. It looks like SpliSER has taken one (very rare) junction and not recognised that it uses the same 5' site as the junctions above. I haven't seen this issue before. Best, Craig

CraigIDent commented 1 month ago

I've recently seen an example of Regtools (rarely) inappropriately splitting a splice junction into two separate lines of the bed file - this could cause your issue above.

If it is only 1 in every 42000 reads, like the example above, I'd probably remove the duplicated sites from the SpliSER results on the basis of alpha counts. Or use a script to collapse the duplicate junctions in your BED file back together.

Sometimes, an apparently duplicated site will come from spurious antisense reads, causing a mirroring of a splice junction (ie one junction on the plus-strand, same junction on the minus strand). This paper might shed some light: https://f1000research.com/articles/8-819 If it were occurring in many sites, then it could be that the reads are not stranded to begin with - but since both your duplicated sites are on the same strand, that doesn't seem to be the problem.