VCCRI / Ularcirc

An R-shiny app that provides backsplice and canonical splicing analysis for both circular RNA (circRNA) and parental transcripts
GNU General Public License v3.0
15 stars 7 forks source link

Result Table Column Explanations #14

Open DarioS opened 4 years ago

DarioS commented 4 years ago

The output table in Gene View tab has 22 columns. Could an explanation for each be in the vignette? For example, I see BSJ_vs_FSJ is always 0 and I wonder what that means. The last few columns are a mystery.

davhum commented 4 years ago

Hi Dario,

There are a number of tables that can be built through Gene View tab. I am assuming you are refering to the "Grouped" table output using STAR chimeric outputs. To have 22 columns you would have a data set of 6 samples. Note each sample is automatically assigned to a "Group" which can be visualised under Projects tab (i.e. if you select items under "List of all groups" the associated sample ID is shown in main panel).

Each sample (group) will contribute 3 columns in the grouped tabulated table under gene view. These three columns are : (i) BSJ count, (ii) a RAD score (group__IIII) and (iii) FSJ score (group_FSJ). There are also 4 generic columns built into the table. These are BSjuncName (unique identifier), strandDonor, Gene, and juncType. JuncType is copied from column 7 of STAR chimeric output (possible values are: -1=encompassing junction (between the mates), 1=GT/AG, | 2=CT/AC).

The RAD score is the ratio of type II / type III alignments and is labelled "Group__II_III". A value close to 0.5 indicates a good balance between type II and type III alignments. A value of 0 or 1 means a strong bias towards one alignment type which suggests circRNA might be a false positive.

The forward splice junction (FSJ) score identifies the presence of reads that support canonical splice junctions that use splice donor/acceptor of BSJ coordinates. So if both splice donor and acceptor of a BSJ is used in the parental transcript then a score of 2 is given. If only one is used then a score of 1 is given. This score is most useful for BSJ that don't align with current gene models.

Note for some data sets (eg RNaseR treated) you would expect FSJ score of 0.

Regards, D