comprna / SUPPA

SUPPA: Fast quantification of splicing and differential splicing
MIT License
262 stars 62 forks source link

some questions #120

Closed Jian288 closed 2 years ago

Jian288 commented 3 years ago

Dear EduEyras I analyzed my own RNA-seq data with SUPPA2,and encountered some question. when I use the test data you provided, the SUPPA2 is running without any error. but when I used SUPPA2 to analysis my own RNA-seq data,after ‘psiPerEvent’,there are many errors,just like:ERROR:psiCalculator:transcript ENST00000545291 not found in the "expression file". ERROR:psiCalculator:PSI not calculated for event ENSG00000240303;SE:chr3:132280061-132294616:132294770-132295788:-. But I still got the final result(.psi file), and not all transcripts have a null value of psi. I know this may be because not all transcripts in the .ioe file can be found in the expression file. But is this situation common, or does it affect the judgment of differential splicing in different treatment groups? Another question is after I run the quantification, I got six iso_tpm.txt(because I have six samples), how can I integrate them into one iso_tpm.txt file? Do you have any script?

EduEyras commented 3 years ago

Hi,

that message simply means that some of your transcripts in the ioe file are not expressed. That's quite normal from the point of view of the biology: you don't expect all genes to be expressed all the time, and you may expect some transcript isoforms not to be expressed in one or more conditions.

This situation is common when your experiments do not have enough depth, so many transcripts are not seen in the RNA-seq, or when using data for highly differentiated tissues, which tend to have a more restrictive pattern of gene/isoform expression.

The PSI calculate can work with zeroes in some of the transcript isoforms, so you could put all those transcripts to zero to make sure you don't miss some events.

When the transcript is missing and cannot calculate the PSI, it will put the PSI to NA.

I hope this helps

Eduardo

On Sun, 7 Mar 2021 at 20:46, Jian288 notifications@github.com wrote:

Dear EduEyras I analysis my own RNA-seq data with SUPPA2,and encountered some question. when I use the test data you provided, the SUPPA2 is running without any error. but when I used SUPPA2 to analysis my own RNA-seq data,after ‘psiPerEvent’,there are many errors,just like:ERROR:psiCalculator:transcript ENST00000545291 not found in the "expression file". ERROR:psiCalculator:PSI not calculated for event ENSG00000240303;SE:chr3:132280061-132294616:132294770-132295788:-. But I still got the final result(.psi file), and not all transcripts have a null value of psi. I know this may be because not all transcripts in the .ioe file can be found in the expression file. But is this situation common, or does it affect the judgment of differential splicing in different treatment groups?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/120, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKBYVOCGM6TAXJWGT35LTCNDOBANCNFSM4YXVYWPQ .

-- Prof. E Eyras EMBL Australia Group Leader The John Curtin School of Medical Research - Australian National University https://github.com/comprna http://scholar.google.com/citations?user=LiojlGoAAAAJ

Jian288 commented 3 years ago

Thanks for your help,after run the quantification,I got about 170000 transcripts,and about 18000 transcripts have no PSI ,The probability is about 10%. Is this probability high?

EduEyras commented 3 years ago

Hi,

I would say it is ok. But it also depends. Are these protein-coding transcripts in standard chromosomes? Or perhaps unknown genes in unplaced contigs? Or perhaps pseudogenes?

It should not be a problem unless you find a gene in there that should be expressed in your conditions.

But the number of zeros also depends on the depth of your sequencing datasets, and the method to perform the transcript quantification.

I hope this helps

E.

On Mon, 8 Mar 2021 at 13:13, Jian288 notifications@github.com wrote:

Thanks for your help,after run the quantification,I got about 170000 transcripts,and about 18000 transcripts have no PSI ,The probability is about 10%. Is this probability high?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/120#issuecomment-792412612, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKBZZ6QYYOTQU6DFVO5DTCQXEBANCNFSM4YXVYWPQ .

-- Prof. E Eyras EMBL Australia Group Leader The John Curtin School of Medical Research - Australian National University https://github.com/comprna http://scholar.google.com/citations?user=LiojlGoAAAAJ

Jian288 commented 3 years ago

Another question is after I run the quantification, I got six iso_tpm.txt(because I have six samples), how can I integrate them into one iso_tpm.txt file? Do you have any script?

EduEyras commented 2 years ago

Please take a look at the wiki page. It provides some examples to do that. best E