comprna / SUPPA

SUPPA: Fast quantification of splicing and differential splicing
MIT License
244 stars 60 forks source link

SUPPA not working on quantification #160

Open Oliverfeudj opened 1 year ago

Oliverfeudj commented 1 year ago

Hello I am trying to run SUPPA to assess alternative splicing event, I have done everything as described in the documentation regarding the expression file but it still does not work. Here is the command line I am using: suppa.py psiPerEvent --ioe-file mm10_SE_strict.ioe --expression-file spc.tsv --output-file ./spc_SE

Thank you for helping me out Best

EduEyras commented 1 year ago

Hi Olivier,

I managed to reproduce your problem.

I looked at the .ioe and expression files, and saw that your expression file had a total 56980 different transcript IDs and your ioe file has 17822 transcript ids.

Many of the IDs in the ioe file are not in the expression file. What happens is that, since each event is defined by multiple transcripts, if some do not appear in the expression, the event may be left as undefined.

Not being in the expression file is not considered to be the same as having zero expression, so this may be the cause of having so many NA's.

One thing that you could try is including the zeroes in the expression file and see whether the event PSIs can be calculated.

I hope this helps

best

Eduardo

On Mon, 27 Mar 2023 at 21:12, Olivier Feudjio @.***> wrote:

Hello I am trying to run SUPPA to assess alternative splicing event, I have done everything as described in the documentation regarding the expression file but it still does not work. Here is the command line I am using: suppa.py psiPerEvent --ioe-file mm10_SE_strict.ioe --expression-file spc.tsv --output-file ./spc_SE

Thank you for helping me out Best

— Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/160, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKB3D42T6OX2M52ZBATLW6FRZBANCNFSM6AAAAAAWI7DLDY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Oliverfeudj commented 1 year ago

Hello @EduEyras, Thank you for your answer. All of the transcripts in my expression file have a value, of at least zero. So if I understand well, you suggest extracting all the transcripts IDs from the ioe file and for the ones that do not have a value in the expression file, I should assign zero and see if it is able to work, right?

I think that at least for the transcripts that appear to be in the expression file, there should be a PSI calculated but this is not the case, it gives me only NAs, regardless of whether the transcript ID is present in the expression file or not.

Best.

EduEyras commented 1 year ago

Could you check if there are any events that have all the transcript IDs involved in the event in the expression file? My suspicion is that there isn’t, and that’s why is NA for all events E

On Tue, 28 Mar 2023 at 02:07, Olivier Feudjio @.***> wrote:

Hello @EduEyras https://github.com/EduEyras, Thank you for your answer. All of the transcripts in my expression file have a value, of at least zero. So if I understand well, you suggest extracting all the transcripts IDs from the ioe file and for the ones that do not have a value in the expression file, I should assign zero and see if it is able to work, right?

I think that at least for the transcripts that appear to be in the expression file, there should be a PSI calculated but this is not the case, it gives me only NAs, regardless of whether the transcript ID is present in the expression file or not.

Best.

— Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/160#issuecomment-1485290710, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKB6SOHGALEVDHN4QLPLW6GULNANCNFSM6AAAAAAWI7DLDY . You are receiving this because you were mentioned.Message ID: @.***>

--

Oliverfeudj commented 1 year ago

Thank you for your reply, I am afraid that I might not understand what you mean or how to check that, can you please elaborate more on your question? Best

Sent from Mail.ru app for Android Monday, 27 March 2023, 11:01PM +02:00 from Eduardo Eyras @.*** :

Could you check if there are any events that have all the transcript IDs involved in the event in the expression file? My suspicion is that there isn’t, and that’s why is NA for all events E

On Tue, 28 Mar 2023 at 02:07, Olivier Feudjio @.***> wrote:

Hello @EduEyras < https://github.com/EduEyras> , Thank you for your answer. All of the transcripts in my expression file have a value, of at least zero. So if I understand well, you suggest extracting all the transcripts IDs from the ioe file and for the ones that do not have a value in the expression file, I should assign zero and see if it is able to work, right?

I think that at least for the transcripts that appear to be in the expression file, there should be a PSI calculated but this is not the case, it gives me only NAs, regardless of whether the transcript ID is present in the expression file or not.

Best.

— Reply to this email directly, view it on GitHub < https://github.com/comprna/SUPPA/issues/160#issuecomment-1485290710> , or unsubscribe < https://github.com/notifications/unsubscribe-auth/ADCZKB6SOHGALEVDHN4QLPLW6GULNANCNFSM6AAAAAAWI7DLDY> . You are receiving this because you were mentioned.Message ID: @.***>

-- — Reply to this email directly, view it on GitHub , or unsubscribe . You are receiving this because you authored the thread. Message ID: @ github . com>

EduEyras commented 1 year ago

Hi,

the ioe file defines the event by indicating the transcript that includes the alternative exon and the transcripts that include or exclude the exon, e.g.:

1 ENSMUSG00000026312 ENSMUSG00000026312;SE:1:110036685-110039327:110039381-110065592:+ ENSMUST00000134301 ENSMUST00000027542,ENSMUST00000134301,ENSMUST00000112701,ENSMUST00000172005,ENSMUST00000131464

So I'm wondering whether the events are NA because there are missing values in the last column (the transcript IDs excluding the event)

E.

On Tue, 28 Mar 2023 at 17:52, Olivier Feudjio @.***> wrote:

Thank you for your reply, I am afraid that I might not understand what you mean or how to check that, can you please elaborate more on your question? Best

Sent from Mail.ru app for Android Monday, 27 March 2023, 11:01PM +02:00 from Eduardo Eyras @.*** :

Could you check if there are any events that have all the transcript IDs involved in the event in the expression file? My suspicion is that there isn’t, and that’s why is NA for all events E

On Tue, 28 Mar 2023 at 02:07, Olivier Feudjio @.***> wrote:

Hello @EduEyras < https://github.com/EduEyras> , Thank you for your answer. All of the transcripts in my expression file have a value, of at least zero. So if I understand well, you suggest extracting all the transcripts IDs from the ioe file and for the ones that do not have a value in the expression file, I should assign zero and see if it is able to work, right?

I think that at least for the transcripts that appear to be in the expression file, there should be a PSI calculated but this is not the case, it gives me only NAs, regardless of whether the transcript ID is present in the expression file or not.

Best.

— Reply to this email directly, view it on GitHub < https://github.com/comprna/SUPPA/issues/160#issuecomment-1485290710> , or unsubscribe < https://github.com/notifications/unsubscribe-auth/ADCZKB6SOHGALEVDHN4QLPLW6GULNANCNFSM6AAAAAAWI7DLDY

. You are receiving this because you were mentioned.Message ID: @.***>

-- — Reply to this email directly, view it on GitHub , or unsubscribe . You are receiving this because you authored the thread. Message ID: @ github . com>

— Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/160#issuecomment-1486311404, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKB3YI5ELYOHC6HQR433W6KDDRANCNFSM6AAAAAAWI7DLDY . You are receiving this because you were mentioned.Message ID: @.***>

Oliverfeudj commented 1 year ago

I have tried to punctually grep some IDs from the expression file and they exist in the ioe file too

can I please send you both files for you t check? Here there are: mm10_SE_strict.zip expression files.zip

I would be very grateful if you could have a look at them

EduEyras commented 1 year ago

Hi,

I made a quick perl script to parse the expression and ioe file and check whether any of the transcript IDs in each event are missing, and all of them have one or more transcript IDs missing in the expression file. I paste the script below. It works as follows:

perl check_exp_ioe_files.pl spd1.tsv mm10_SE_strict.ioe

!/usr/bin/perl -w

use strict;

my ($exp_file, $ioe_file) = @ARGV;

unless($exp_file && $ioe_file){

print STDERR "Usage: $0 exp_file ioe_file\n";

print STDERR "Script to check that the ioe file has IDs with expression

values\n";

exit(0);

}

open (INPUT, $exp_file) or die("cannot open $exp_file, $!");

my %ids_exp;

while(my $line=){

chomp $line;

next unless($line=~/ENST*/);

my @line_array = split(/\t/,$line);

$ids_exp{$line_array[0]}++;

}

close(INPUT) or die("cannot close $exp_file");

open (INPUT, $ioe_file) or die("cannot open $ioe_file, $!");

while(my $line=){

chomp $line;

next unless($line=~/ENST*/);

# 1       ENSMUSG00000039748

ENSMUSG00000039748;SE:1:175733555-175734172:175734226-175735996:+ ENSMUST00000194636 ENSMUST00000039725,ENSMUST00000193610,ENSMUST00000194636

my ($chr, $gene, $event, $t1, $t_line) = split(/\t/,$line);

my %missing;

$missing{$t1}++ unless $ids_exp{$t1};

my @t_list = split(",", $t_line);

foreach my $t ***@***.***_list){

    $missing{$t}++ unless $ids_exp{$t};

}

my @missing_ids = keys %missing;

if ***@***.***_ids){

    my $s = join "\t", ($chr, $gene, $event, $t1, $t_line, "missing", @

missing_ids);

    print $s."\n";

}

else{

    my $s = join "\t", ($chr, $gene, $event, $t1, $t_line, "correct");

    print $s."\n";

}

}

On Wed, 29 Mar 2023 at 02:59, Olivier Feudjio @.***> wrote:

I have tried to punctually grep some IDs from the expression file and they exist in the ioe file too

can I please send you both files for you t check? Here there are: mm10_SE_strict.zip https://github.com/comprna/SUPPA/files/11091460/mm10_SE_strict.zip expression files.zip https://github.com/comprna/SUPPA/files/11091461/expression.files.zip

I would be very grateful if you could have a look at them

— Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/160#issuecomment-1487196086, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKB2MOLX626FX2DFKEFLW6MDFLANCNFSM6AAAAAAWI7DLDY . You are receiving this because you were mentioned.Message ID: @.***>

Oliverfeudj commented 1 year ago

Hello @EduEyras Thank you very much for this script, what can be the solution to this? how can I solve this problem, please? At the same time, I am very confused because the ioe files were generated using the same gtf file used to generate the expression files so the transcripts IDs are supposed to be the same.

Best

EduEyras commented 1 year ago

Hi Yes, the ioe is fine I am wondering whether adding the ids with zeroes in the expression file might help E

On Wed, 29 Mar 2023 at 21:46, Olivier Feudjio @.***> wrote:

Hello @EduEyras https://github.com/EduEyras Thank you very much for this script, what can be the solution to this? how can I solve this problem, please? At the same time, I am very confused because the ioe files were generated using the same gtf file used to generate the expression files so the transcripts IDs are supposed to be the same.

Best

— Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/160#issuecomment-1488365996, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKBZTCA2MCDHLHH36U6LW6QHJXANCNFSM6AAAAAAWI7DLDY . You are receiving this because you were mentioned.Message ID: @.***>

--

Oliverfeudj commented 1 year ago

I will try to extract those transcripts IDs from the ioe file and assign zeroes to them in the expression file to see if it changes the outcome.

Thank you!