VCCRI / Ularcirc

An R-shiny app that provides backsplice and canonical splicing analysis for both circular RNA (circRNA) and parental transcripts
GNU General Public License v3.0
15 stars 7 forks source link

Genomic Information Not Retrieved for Any Junction #11

Open DarioS opened 4 years ago

DarioS commented 4 years ago

No matter which junction I click on in Gene_View, when I click on the Junction_View tab I get the pop-up error "Cannot retrieve genomic information for this gene". Below the load database button in Setup tab, I see Hsapiens.UCSC.hg38, so it seems the annotations have successfully been loaded.

davhum commented 4 years ago

Hi Dario,

Can you confirm this is the case for the TwoSzabo data set? If TwoSzabo data sets works can you let me know if you analysing STAR/CIRI/CircExplorer output? If you are NOT using STAR can you copy paste an example of the gene name as it might be a lookup issue.

Thanks, D

DarioS commented 4 years ago

Ah, I figured it out now. It happens when Annotate With Parental Gene is not checked. Perhaps the pop-up error should state this as a possible cause. Currently, it only advises to check a database was loaded in Setup tab, which it was.

I notice a couple of other issues. There's always a red Shiny warning message at the bottom of output:

image

Also, one gene - MUC7 - has a circular RNA of length infinity. I attach input files to reproduce. test.zip

davhum commented 4 years ago

Good suggestions.

I have started to look at test data set. The MUC7 example doesn't look like a BSJ, but rather a forward splice junction. I need to work out why this enters chimeric output. Will have a solution implemented in next day or so.

davhum commented 4 years ago

Any chance you could attach a couple of sequences from fastq file. This will be useful sanity check. For example the following read IDs would be useful:

A00121:71:HFFY2DSXX:2:1106:11776:28745 A00121:71:HFFY2DSXX:2:1124:22236:36495 A00121:71:HFFY2DSXX:2:1146:19090:8844

DarioS commented 4 years ago

The pair of FASTQ files for testing is in the archive testReads.zip Trimmed using cutadapt.

davhum commented 4 years ago

When I blat each of those reads it shows that they are indeed canonical junctions (see image below). Given that the CIGAR strings in chimeric junction output also suggested that they are not chimeric suggests some sort of leaky non-chimeric reporting by in chimeric output of STAR aligner.

There are a couple of things I would now like to do: 1) Modify Ularcirc to filter out these candidates. 2) Chase up why STAR aligner reports these reads as chimeric. On this note are you OK if I pass the testReads.zip you generated as part of a issue on STAR aligner.

MUC7_blat_fastq

In above image have labelled each forward read as F1 F2 F3 and each paired end read ar R1 R2 R3.

DarioS commented 4 years ago

Yes, please create a new issue for STAR aligner using this test data. I am happy if you avoid creating extra filtering in Ularcirc and just get it fixed at the origin. I plan to re-map the data set soon anyway, because the next version of STAR will support writing multi-mapping chimeras within the BAM file, which is useful for arriba and rearrangements involving immunoglobulin genes, so I am happy to wait a while.

davhum commented 4 years ago

Submitted issue to STAR.