LosicLab / starchip

Detection of Circular RNA and Fusions from RNA-Seq
http://starchimp.readthedocs.io/en/latest/
MIT License
32 stars 11 forks source link

Unable to detect known fusions in PacBio data #26

Closed wolfgangrumpf closed 5 years ago

wolfgangrumpf commented 5 years ago

I have some data that I know has fusions in it, but I'm unable to detect them with STARCHIP. The sequencing was performed on a PacBio machine - are there any particular settings I should use for that? I did run STARlong to create the BAM file.....

wolfgangrumpf commented 5 years ago

Here's the output, including parameters - how can I change the sensitivity? We know that at least one of the 24 potential fusion products is real:

Using the following variables: Paired-End: FALSE Split Reads Cutoff: 2 Unique Support Values Min: 1 Spanning Reads Cutoff: 0 Location Wiggle Room (spanning reads): 50 bp Location Wiggle Room (split reads) : 5 bp Min-distance : 0 bp Read Distribution upper limit: 1000000000 X Read Distribution lower limit: 0 X

Now catologuing all chimeric reads Read length appears to be 424 Finished catologing fusion reads, now processing over 24 potential fusion sites Total fusions passing read thresholds found: 0 These fusions will now be filtered based on annotations /gpfs0/home/gdlauberlab/rwr002/STARCHIP/starchip/scripts/fusions/annotate-fusions.pl Fusions Chimeric.out.junction hg38.params.txt

No fusions discovered. Consider lowering read requirements to increase sensitivity.

kippakers commented 5 years ago

Hi Wolfgang,

Those parameters are already fairly low. Normally I wouldn't advise going down to 1 split read, but because this is PacBio data, it probably won't take too long to process. The other option is you can increase the split reads wiggle room (ie. distance between separate fusions that can be merged), but I doubt that's where you're losing reads. Failing that, we can try inspecting the intermediate files (ie the 24 potential fusions) to see why they're getting filtered.

Hope that helps! Kipp

wolfgangrumpf commented 5 years ago

I tried increasing the split reads wiggle and no change. Would it be helpful if I sent you the Chimeric.junction.out file?

Sent from my iPad

On Apr 16, 2019, at 11:53 AM, Nicholas Kipp Akers notifications@github.com wrote:

Hi Wolfgang,

Those parameters are already fairly low. Normally I wouldn't advise going down to 1 split read, but because this is PacBio data, it probably won't take too long to process. The other option is you can increase the split reads wiggle room (ie. distance between separate fusions that can be merged), but I doubt that's where you're losing reads. Failing that, we can try inspecting the intermediate files (ie the 24 potential fusions) to see why they're getting filtered.

Hope that helps! Kipp

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

kippakers commented 5 years ago

Sure, I can take a look. If it's small enough you can send it to my gmail.

wolfgangrumpf commented 5 years ago

In hindsight the problem may be that Starchip can’t detect what I’m looking for! I’m looking for intragenic (rather than intergenic) fusions, e.g. isoforms where the exon order is actually scrambled. I’ve gone through my input file and it doesn’t include that at all - only intergenic stuff. So....I’m exploring other methods.....

Sent from my iPad

On Apr 16, 2019, at 4:22 PM, Nicholas Kipp Akers notifications@github.com wrote:

Sure, I can take a look. If it's small enough you can send it to my gmail.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

kippakers commented 5 years ago

With PacBio data, maybe your best bet is doing de-novo assembly? Tough problem to crack, good luck!