comprna / SUPPA

SUPPA: Fast quantification of splicing and differential splicing
MIT License
262 stars 62 forks source link

psiperEvent not working #162

Open SignorLab opened 1 year ago

SignorLab commented 1 year ago

I am able to complete psiperIsoform without issues and get a differential splicing result. However, for psiperEvent I get the following error when I try to create the psi file:

INFO:lib.tools:File quants/FR_ETOH_expression_TPM.sf opened in reading mode.
INFO:psiCalculator:Buffering transcript expression levels.
ERROR:lib.tools:1, in line 2. Skipping line...
ERROR:lib.tools:2, in line 3. Skipping line...
ERROR:lib.tools:3, in line 4. Skipping line...
ERROR:lib.tools:4, in line 5. Skipping line...
...
ERROR:lib.tools:32958, in line 32959. Skipping line...
ERROR:lib.tools:32959, in line 32960. Skipping line...
ERROR:lib.tools:32960, in line 32961. Skipping line...
ERROR:lib.tools:32961, in line 32962. Skipping line...
ERROR:lib.tools:32962, in line 32963. Skipping line...
ERROR:lib.tools:32963, in line 32964. Skipping line...
ERROR:lib.tools:32964, in line 32965. Skipping line...
ERROR:lib.tools:32965, in line 32966. Skipping line...
INFO:lib.tools:File quants/FR_ETOH_expression_TPM.sf closed.
ERROR:psiCalculator:No expression values have been buffered.
ERROR:psiCalculator:Unknown error: 1

The code is as follows:

suppa.py psiPerEvent -e expression_TPM.sf -i dmel -o psi_suppa.ioe

The ioe file was made with the following code:

suppa.py generateEvents -i Drosophila_melanogaster.BDGP6.32.109.gtf -o dmel -f ioi --pool-genes 

And an example of what it looks like is here:

seqname gene_id event_id        alternative_transcripts total_transcripts
3R      FBgn0011290     FBgn0011290;SE:3R:11679184-11679354:11679398-11679567:- FBtr0335213     FBtr0335213,FBtr0290205
3R      FBgn0038725     FBgn0038725;SE:3R:19655672-19656205:19656372-19657582:- FBtr0345271     FBtr0345270,FBtr0345271
3R      FBgn0037837     FBgn0037837;SE:3R:10812213-10812314:10812664-10812798:- FBtr0339140,FBtr0336710 FBtr0300078,FBtr0339140,FBtr0082301,FBtr0336710
3R      FBgn0259139     FBgn0259139;SE:3R:11816757-11817194:11817338-11818006:- FBtr0089917,FBtr0089919 FBtr0089919,FBtr0089918,FBtr0089917
3R      FBgn0265276     FBgn0265276;SE:3R:11771300-11777474:11778621-11778982:- FBtr0110876     FBtr0307086,FBtr0110876
3R      FBgn0265276     FBgn0265276;SE:3R:11771300-11777474:11778621-11778858:- FBtr0305186     FBtr0305186,FBtr0307083
3R      FBgn0003165     FBgn0003165;SE:3R:9081022-9141320:9141331-9157118:-     FBtr0305197     FBtr0333668,FBtr0305197
3R      FBgn0003165     FBgn0003165;SE:3R:9081022-9098696:9098737-9157118:-     FBtr0081994     FBtr0333668,FBtr0081994
3R      FBgn0027492     FBgn0027492;SE:3R:27568591-27580157:27580325-27582002:- FBtr0085204     FBtr0479866,FBtr0085204,FBtr0085203
3R      FBgn0027492     FBgn0027492;SE:3R:27568591-27580157:27580325-27581144:- FBtr0085209     FBtr0085208,FBtr0085209

My expression TPM file looks like what follows:

Names     sample1     sample2     sample3     sample4     sample5     sample6     sample7     sample8     sample9
FBtr0308931     0.134402        0.233132        0.194543        0.000000        0.148624        0.479496        0.487405        56.666367       1.282022
FBtr0308932     0.000000        0.000000        0.000000        0.937857        0.000000        0.000000        0.000000        0.668767        0.000000
FBtr0070634     232.840518      396.445007      159.553419      140.755644      263.318480      236.227050      394.708294      210.230086      187.888669
FBtr0070635     317.499025      345.006916      375.252327      329.685911      385.136797      355.924015      620.179425      334.390078      392.112750
FBtr0337066     0.000000        0.000000        0.000000        0.000000        0.000000        0.000000        0.000000        0.000000        0.000000
FBtr0337067     42.777204       48.561146       57.471017       52.749531       56.579878       50.621997       61.651118       26.885212       50.110896
FBtr0345714     8.680178        14.161588       12.604554       3.276385        5.098316        4.650082        9.928081        4.824662        8.217879
FBtr0344479     0.857818        0.471674        0.368677        0.469706        0.503530        0.810262        0.540480        0.197929        0.379062
FBtr0344480     0.086538        0.000000        0.037693        0.079004        0.120589        0.044486        0.048852        0.035409        0.146661
FBtr0445669     0.000000        0.000000        0.000000        0.000000        0.000000        0.000000        0.000000        4.205787        0.305157
FBtr0308924     0.000000        0.047815        0.020123        0.000000        0.000000        0.043046        0.000000        0.039629        0.000000
FBtr0340166     0.000000        0.000000        0.000000        0.000000        0.000000        0.000000        0.044757        3.826418        0.000000
FBtr0332036     0.000000        0.105939        0.000000        0.000000        0.000000        0.000000        0.000000        0.000000        0.067182

Most of what I can find about this error suggests that the transcript names potentially don't match between files, but I was able to successfully do a per Isoform analysis with the same files and I am able to grep some transcript names from each file.

if you have any ideas it would be most appreciated.

EduEyras commented 1 year ago

Dear Sarah,

Did you get an output for some events? It could be possible that this happens only for events that do not have expression for the associated transcripts. You could try modifying the number of NA cases that are allowed per event.

I hope this helps Eduardo

On Wed, 17 May 2023 at 01:47, Sarah Signor @.***> wrote:

I am able to complete psiperIsoform without issues and get a differential splicing result. However, for psiperEvent I get the following error when I try to create the psi file:

INFO:lib.tools:File quants/FR_ETOH_expression_TPM.sf opened in reading mode. INFO:psiCalculator:Buffering transcript expression levels. ERROR:lib.tools:1, in line 2. Skipping line... ERROR:lib.tools:2, in line 3. Skipping line... ERROR:lib.tools:3, in line 4. Skipping line... ERROR:lib.tools:4, in line 5. Skipping line... ... ERROR:lib.tools:32958, in line 32959. Skipping line... ERROR:lib.tools:32959, in line 32960. Skipping line... ERROR:lib.tools:32960, in line 32961. Skipping line... ERROR:lib.tools:32961, in line 32962. Skipping line... ERROR:lib.tools:32962, in line 32963. Skipping line... ERROR:lib.tools:32963, in line 32964. Skipping line... ERROR:lib.tools:32964, in line 32965. Skipping line... ERROR:lib.tools:32965, in line 32966. Skipping line... INFO:lib.tools:File quants/FR_ETOH_expression_TPM.sf closed. ERROR:psiCalculator:No expression values have been buffered. ERROR:psiCalculator:Unknown error: 1

The code is as follows: `` suppa.py psiPerEvent -e expression_TPM.sf -i dmel -o psi_suppa.ioe

The ioe file was made with the following code:

suppa.py generateEvents -i Drosophila_melanogaster.BDGP6.32.109.gtf -o dmel -f ioi --pool-genes

And an example of what it looks like is here:

seqname gene_id event_id alternative_transcripts total_transcripts 3R FBgn0011290 FBgn0011290;SE:3R:11679184-11679354:11679398-11679567:- FBtr0335213 FBtr0335213,FBtr0290205 3R FBgn0038725 FBgn0038725;SE:3R:19655672-19656205:19656372-19657582:- FBtr0345271 FBtr0345270,FBtr0345271 3R FBgn0037837 FBgn0037837;SE:3R:10812213-10812314:10812664-10812798:- FBtr0339140,FBtr0336710 FBtr0300078,FBtr0339140,FBtr0082301,FBtr0336710 3R FBgn0259139 FBgn0259139;SE:3R:11816757-11817194:11817338-11818006:- FBtr0089917,FBtr0089919 FBtr0089919,FBtr0089918,FBtr0089917 3R FBgn0265276 FBgn0265276;SE:3R:11771300-11777474:11778621-11778982:- FBtr0110876 FBtr0307086,FBtr0110876 3R FBgn0265276 FBgn0265276;SE:3R:11771300-11777474:11778621-11778858:- FBtr0305186 FBtr0305186,FBtr0307083 3R FBgn0003165 FBgn0003165;SE:3R:9081022-9141320:9141331-9157118:- FBtr0305197 FBtr0333668,FBtr0305197 3R FBgn0003165 FBgn0003165;SE:3R:9081022-9098696:9098737-9157118:- FBtr0081994 FBtr0333668,FBtr0081994 3R FBgn0027492 FBgn0027492;SE:3R:27568591-27580157:27580325-27582002:- FBtr0085204 FBtr0479866,FBtr0085204,FBtr0085203 3R FBgn0027492 FBgn0027492;SE:3R:27568591-27580157:27580325-27581144:- FBtr0085209 FBtr0085208,FBtr0085209

My expression TPM file looks like what follows:

Names sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9 FBtr0308931 0.134402 0.233132 0.194543 0.000000 0.148624 0.479496 0.487405 56.666367 1.282022 FBtr0308932 0.000000 0.000000 0.000000 0.937857 0.000000 0.000000 0.000000 0.668767 0.000000 FBtr0070634 232.840518 396.445007 159.553419 140.755644 263.318480 236.227050 394.708294 210.230086 187.888669 FBtr0070635 317.499025 345.006916 375.252327 329.685911 385.136797 355.924015 620.179425 334.390078 392.112750 FBtr0337066 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 FBtr0337067 42.777204 48.561146 57.471017 52.749531 56.579878 50.621997 61.651118 26.885212 50.110896 FBtr0345714 8.680178 14.161588 12.604554 3.276385 5.098316 4.650082 9.928081 4.824662 8.217879 FBtr0344479 0.857818 0.471674 0.368677 0.469706 0.503530 0.810262 0.540480 0.197929 0.379062 FBtr0344480 0.086538 0.000000 0.037693 0.079004 0.120589 0.044486 0.048852 0.035409 0.146661 FBtr0445669 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 4.205787 0.305157 FBtr0308924 0.000000 0.047815 0.020123 0.000000 0.000000 0.043046 0.000000 0.039629 0.000000 FBtr0340166 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.044757 3.826418 0.000000 FBtr0332036 0.000000 0.105939 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.067182

Most of what I can find about this error suggests that the transcript names potentially don't match between files, but I was able to successfully do a per Isoform analysis with the same files and I am able to grep some transcript names from each file.

if you have any ideas it would be most appreciated.

— Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/162, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKB2OO57EJ53E56U7S23XGOOSDANCNFSM6AAAAAAYD3HNZY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

SignorLab commented 1 year ago

No, it creates an empty file.

EduEyras commented 1 year ago

Have you checked that there is no difference in the transcript IDs? E.g. hidden characters, additional spaces, etc... or missing headers where needed? Alternatively, it would be the missing values, but you seem to still get the results for the isoform events. I would need to run it on a portion of the input files to check if there is anything wrong. E.

On Thu, 25 May 2023 at 01:51, Sarah Signor @.***> wrote:

No, it creates an empty file.

— Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/162#issuecomment-1561434587, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKB5SFCMRYX5H556GQJDXHYU6ZANCNFSM6AAAAAAYD3HNZY . You are receiving this because you commented.Message ID: @.***>

Max-Tomlinson commented 1 year ago

I was having the same issue but after a few hours of troubleshooting I've identified the problem...

The psiPerIsoform command will work with an expression matrix with or without an ID column, in your case 'Names', but for some reason psiPerEvent doesn't like this, so I moved my ID column to the row names and everything worked fine.

Hope this helps!

TimvanderWiel1 commented 1 year ago

I have the same issue, i generated a TPM file with FeatureCounts in Galaxy. And the .ioe files with Suppa GenerateEvents. The geneID in the .ioe and TPM file are the same. But i keep getting the error: ERROR:lib.tools:13116, in line 13117. Skipping line... ERROR:lib.tools:13117, in line 13118. Skipping line... ERROR:lib.tools:13118, in line 13119. Skipping line... ERROR:lib.tools:13119, in line 13120. Skipping line... ERROR:lib.tools:13120, in line 13121. Skipping line... INFO:lib.tools:File /mnt/studentfiles/2023/2023MBI_05/analysis/full_suppa_analysis/TPM_tabs/Col_ant_rep1_tpm_values_tabs.tpm closed. ERROR:psiCalculator:No expression values have been buffered. ERROR:psiCalculator:Unknown error: 1

I use this command: suppa.py psiPerEvent -i "/mnt/studentfiles/2023/2023MBI_05/analysis/full_suppa_analysis/suppa_generate_events/col_ant_rep1_merged.ioe" -e "/mnt/studentfiles/2023/2023MBI_05/analysis/full_suppa_analysis/TPM_tabs/Col_ant_rep1_tpm_values_tabs.tpm" -o "/mnt/studentfiles/2023/2023MBI_05/analysis/full_suppa_analysis/suppa_test/"

What can i adjust to the files or script, to avoid this error?

EduEyras commented 1 year ago

What python version are you using? E.

On Mon, 5 Jun 2023 at 19:08, TimvanderWiel1 @.***> wrote:

I have the same issue, i generated a TPM file with FeatureCounts in Galaxy. And the .ioe files with Suppa GenerateEvents. The geneID in the .ioe and TPM file are the same. But i keep getting the error: ERROR:lib.tools:13116, in line 13117. Skipping line... ERROR:lib.tools:13117, in line 13118. Skipping line... ERROR:lib.tools:13118, in line 13119. Skipping line... ERROR:lib.tools:13119, in line 13120. Skipping line... ERROR:lib.tools:13120, in line 13121. Skipping line... INFO:lib.tools:File /mnt/studentfiles/2023/2023MBI_05/analysis/full_suppa_analysis/TPM_tabs/Col_ant_rep1_tpm_values_tabs.tpm closed. ERROR:psiCalculator:No expression values have been buffered. ERROR:psiCalculator:Unknown error: 1

I use this command: suppa.py psiPerEvent -i "/mnt/studentfiles/2023/2023MBI_05/analysis/full_suppa_analysis/suppa_generate_events/col_ant_rep1_merged.ioe" -e "/mnt/studentfiles/2023/2023MBI_05/analysis/full_suppa_analysis/TPM_tabs/Col_ant_rep1_tpm_values_tabs.tpm" -o "/mnt/studentfiles/2023/2023MBI_05/analysis/full_suppa_analysis/suppa_test/"

What can i adjust to the files or script, to avoid this error?

— Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/162#issuecomment-1576419644, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKB3AMJID74KF4BBPGKTXJWOZFANCNFSM6AAAAAAYD3HNZY . You are receiving this because you commented.Message ID: @.***>