Xinglab / rMATS-long

Other
23 stars 2 forks source link

differential genes/transcripts files #25

Closed trista1115 closed 1 month ago

trista1115 commented 2 months ago

Hi @EricKutschera

When I ran ESPRESSO, the abundance file reported around 13,000 genes, but rmats-long produced a differential_genes.tsv file with only 9,800 genes. I'm a bit confused about how the differential_genes.tsv file and differential_transcripts.tsv was generated. Could you clarify the process?

Thanks.

EricKutschera commented 2 months ago

rMATS-long runs DRIMSeq to get differential_genes.tsv and differential_transcripts.tsv. DRIMSeq looks at transcript proportion within a gene. There is a filter on what genes and transcripts are processed by DRIMSeq. The filter requires each transcript to be expressed in at least the number of samples in the smaller comparison group. Then genes with at least two transcripts are kept: https://github.com/Xinglab/rMATS-long/blob/ee0186d80d312f3394f73663147103d84275b3fb/scripts/detect_differential_isoforms.R#L176

trista1115 commented 2 months ago

Hi @EricKutschera

How are the red and blue colors assigned to isoforms in the structure and abundance plots? What is the logic behind the color assignment in these plots?

1.Some structure plots lack colors, while the abundance plot includes them.

image image image
  1. What do the red and blue colors represent in the abundance plot? In some plots, red isoform are predominantly in group A, while in others they are predominantly in group B.
image image
  1. rMATS-long identified approximately 60 significant isoforms in my data, but some "_isoform_diff" files did not contain event reports. I then checked the start and end coordinates as shown below:

    chr9 annotated_isoform transcript 64214038 64248539 . - . transcript_id "ENSMUST00000168844.9" chr9 annotated_isoform transcript 64214039 64248570 . - . transcript_id "ENSMUST00000068367.14"

They differ by only 1 base at the start, and in my opinion, this is not a real difference. How can I avoid identifying such isoforms as significant? Could it be that these two isoforms belong to A3SS due to the difference in their end coordinates? I'm not entirely sure.

EricKutschera commented 2 months ago

The red isoform is the most significant. The blue isoform is the next most significant isoform with delta proportion in the opposite direction. All the other isoforms will be grey. The isoforms will have the same color in the abunance and structure plot for the same gene

If the isoform diff file is empty then the isoforms differ only by endpoints. ESPRESSO won't detect a novel isoform that only differs by an endpoint, but ESPRESSO will distinguish two annotated trancripts that differ by endpoints: https://github.com/Xinglab/rMATS-long/issues/22#issuecomment-2221235290