LosicLab / starchip

Detection of Circular RNA and Fusions from RNA-Seq
http://starchimp.readthedocs.io/en/latest/
MIT License
32 stars 11 forks source link

parameters for fusion detection in amplicon based panel #30

Open kerrypeck opened 5 years ago

kerrypeck commented 5 years ago

Thank you very much for the nice tool. I was wondering how suitable is starchip for amplicon based fusion detection in RNA-seq workflows (targeted amplicon-based RNA-seq using reverse-transcribed cDNA as input). For such workflows, the reads number can be very high but the unique reads can be very low, sometimes just 1. Therefore, duplicated reads are usually not removed, and the total reads number (with duplicates) are used directly (without duplicates removal) for quality filtering. Could you please recommend a suitable parameter set to handle this? Or if starchip itself needs to be modified? Thanks.

kippakers commented 5 years ago

Hi kerrypeck,

Interesting use case! I've never encountered the type of data you're describing--so you'll have to use plenty of your own expertise. But based on your description, definitely set uniqueReads=1. Start with splitReads=auto, but you may want to set this by hand after seeing what the results look like. If you're getting lots of false-positives, raise it higher (starchip will print out what value it selected for your data in stdout).

Everything else shouldn't need much special modification. Good luck!

kerrypeck commented 5 years ago

Thank you very much! Your suggestion is very much appreciated. I have a related question -- I was wondering if starchip applies duplicate reads removal implicitly, since in my test runs the SpanningReads and SplitReads values given in the output "summary" files are all very small (frequently 0 and 1 respectively including known fusions) which are unlikely to be the raw counts. Would it be possible to have the options to disable the duplicate reads removal process in starchip? This would help with the special use case discussed in this thread. Thanks again!