comprna / SUPPA

SUPPA: Fast quantification of splicing and differential splicing
MIT License
263 stars 62 forks source link

diffsplice index error #196

Open Amhaslam opened 2 months ago

Amhaslam commented 2 months ago

hi, i am getting following error while running diffsplice, please help me to resolve this error.

Calculating differential analysis between conditions: Suppa_salmon_RB_psi_values and Suppa_salmon_Control_psi_values ERROR:main:Unknown error: (<class 'IndexError'>, IndexError('list index out of range'), <traceback object at 0x790a335260c0>)

EduEyras commented 1 month ago

Could you please send the command line?

Also, please use a copy of the code in github (either a clone or a download). This will contain some bug fixes that might not be available in the conda version

E.

MarekGierlinski commented 1 month ago

Hi,

I have encountered the same error, using the most recent version from GitHub. Here is my command line:

> python3 SUPPA-2.4/suppa.py diffSplice -m empirical -i suppa/events/event_SE_strict.ioe -p suppa/psi/DMSO_4h_SE.psi suppa/psi/ActD_4h_SE.psi -e suppa/tpm/DMSO_4h.txt suppa/tpm/ActD_4h.txt --lower-bound 0.05 -gc -o suppa/diff/ActD_4h_SE.txt

Calculating differential analysis between conditions: DMSO_4h_SE and ActD_4h_SE
ERROR:__main__:Unknown error: (<class 'IndexError'>, IndexError('list index out of range'), <traceback object at 0x2b02ecc8f040>)

The input files are like these:

> head suppa/events/event_SE_strict.ioe
seqname gene_id event_id    alternative_transcripts total_transcripts
1   ENSG00000142611 ENSG00000142611;SE:1:3431108-3431966:3432140-3433677:+  ENST00000270722,ENST00000512462,ENST00000509860 ENST00000378389,ENST00000511072,ENST00000509860,ENST00000512462,ENST00000514189,ENST00000270722
1   ENSG00000142655 ENSG00000142655;SE:1:10495321-10517028:10517131-10536213:+  ENST00000472851 ENST00000356607,ENST00000472851,ENST00000491661
1   ENSG00000232596 ENSG00000232596;SE:1:4571629-4572470:4572592-4593052:+  ENST00000667354,ENST00000659059 ENST00000666685,ENST00000659059,ENST00000667354,ENST00000634256
1   ENSG00000232596 ENSG00000232596;SE:1:4571629-4575322:4575450-4593052:+  ENST00000659083 ENST00000666685,ENST00000659083,ENST00000634256
1   ENSG00000232596 ENSG00000232596;SE:1:4572592-4582277:4582574-4583083:+  ENST00000420522 ENST00000420522,ENST00000661628
1   ENSG00000235054 ENSG00000235054;SE:1:4416548-4423054:4423187-4423348:+  ENST00000635312 ENST00000635002,ENST00000668086,ENST00000635312
1   ENSG00000235054 ENSG00000235054;SE:1:4423187-4423348:4423448-4423994:+  ENST00000635312,ENST00000669931,ENST00000423197,ENST00000659315,ENST00000635642 ENST00000659315,ENST00000423197,ENST00000669931,ENST00000635642,ENST00000635312,ENST00000667352
1   ENSG00000235054 ENSG00000235054;SE:1:4416548-4422760:4423187-4423348:+  ENST00000669931,ENST00000423197 ENST00000423197,ENST00000669931,ENST00000635002,ENST00000668086
1   ENSG00000235054 ENSG00000235054;SE:1:4416548-4422876:4423187-4423348:+  ENST00000659315,ENST00000635642 ENST00000659315,ENST00000635002,ENST00000668086,ENST00000635642

head suppa/psi/DMSO_4h_SE.psi
DMSO_4h_1   DMSO_4h_2   DMSO_4h_3   DMSO_4h_4
ENSG00000000003;SE:X:100630866-100632485:100632568-100633405:-  0.9984909907641735  0.9960692215706672  1.0 0.9993863994390673
ENSG00000000419;SE:20:50936262-50940865:50940933-50942031:- 0.9629173478563579  0.9427937649710495  0.956188066744992   0.9425552007213499
ENSG00000000419;SE:20:50936262-50940865:50940955-50942031:- 0.47221963951960194 0.2933847562014635  0.3657149073452484  0.3038975229653856
ENSG00000000419;SE:20:50940933-50941105:50941209-50942031:- 0.028363373150652894    0.02799640449354926 0.030312514565798154    0.014823359763832214
ENSG00000000419;SE:20:50940933-50941129:50941209-50942031:- 0.07818130233291508 0.08780259000559165 0.054971267248545404    0.07336325358106109
ENSG00000000419;SE:20:50940955-50941105:50941209-50942031:- 0.0 0.035003223060908244    0.01448838433493186 0.10207471483060135
ENSG00000000419;SE:20:50941209-50942031:50942126-50945737:- 0.9832119546137829  1.0000000000000002  0.9618810813223434  0.9769950594008092
ENSG00000000419;SE:20:50942126-50945737:50945762-50945847:- 0.9876598940318572  0.9743969827341252  0.9833589331044317  0.9658016711881765
ENSG00000000457;SE:1:169854964-169855796:169855957-169859041:-  0.4856115420019294  0.5194001770392952  0.533045114862394   0.533167156249122

head suppa/psi/ActD_4h_SE.psi
ActD_4h_1   ActD_4h_2   ActD_4h_3   ActD_4h_4
ENSG00000000003;SE:X:100630866-100632485:100632568-100633405:-  0.9993465716877533  0.9946488099599791  0.9983193877500409  1.0
ENSG00000000419;SE:20:50936262-50940865:50940933-50942031:- 0.936674329975077   0.928984849513219   0.9239805417948307  0.9351968473956108
ENSG00000000419;SE:20:50936262-50940865:50940955-50942031:- 0.3400886395468895  0.22525994489482515 0.34906576990789634 0.2483124442329311
ENSG00000000419;SE:20:50940933-50941105:50941209-50942031:- 0.01568429976434919 0.0169688806521051  0.0232128041190583  0.01772789595641008
ENSG00000000419;SE:20:50940933-50941129:50941209-50942031:- 0.05337329595858719 0.07219034981681514 0.055939850533341266    0.050026735080484164
ENSG00000000419;SE:20:50940955-50941105:50941209-50942031:- 0.012418429736731382    0.01504706810234816 0.0 0.017565223622536257
ENSG00000000419;SE:20:50941209-50942031:50942126-50945737:- 0.9668314225176855  0.9623899468702947  0.9897780452396181  1.0
ENSG00000000419;SE:20:50942126-50945737:50945762-50945847:- 0.9639761546102018  0.962519374887582   0.9574930457839392  0.9560360966187568
ENSG00000000457;SE:1:169854964-169855796:169855957-169859041:-  0.5338348074956553  0.5334716768349987  0.48508218276071796 0.5062566816919817

head suppa/tpm/DMSO_4h.txt
DMSO_4h_1   DMSO_4h_2   DMSO_4h_3   DMSO_4h_4
ENST00000415118 0   0   0   0
ENST00000448914 0   0   0   0
ENST00000434970 0   0   0   0
ENST00000631435 0   0   0   0
ENST00000710614 0   0   0   0
ENST00000605284 0   0   0   0
ENST00000604642 0   0   0   0
ENST00000603077 0   0   0   0
ENST00000603693 0   0   0   0

head suppa/tpm/ActD_4h.txt
ActD_4h_1   ActD_4h_2   ActD_4h_3   ActD_4h_4
ENST00000415118 0   0   0   0
ENST00000448914 0   0   0   0
ENST00000434970 0   0   0   0
ENST00000631435 0   0   0   0
ENST00000710614 0   0   0   0
ENST00000605284 0   0   0   0
ENST00000604642 0   0   0   0
ENST00000603077 0   0   0   0
ENST00000603693 0   0   0   0

Any ideas?

MarekGierlinski commented 1 month ago

Just to add to my previous post, when I change -m empirical to -m classical the code runs with no issues, however, the created files suggest that something is amiss:

l suppa/diff
total 12545
drwxr-s--- 2 mgierlinski barton    4096 Oct 16 12:38 .
drwxr-s--- 6 mgierlinski barton    4096 Oct 16 11:47 ..
-rw-r----- 1 mgierlinski barton 4738471 Oct 16 12:38 ActD_4h_SE.dpsi.temp.0
-rw-r----- 1 mgierlinski barton 8098918 Oct 16 12:38 ActD_4h_SE.psivec

The file ActD_4h_SE.dpsi.temp.0 looks like this:

head suppa/diff/ActD_4h_SE.dpsi.temp.0
Event_id    DMSO_4h_SE-ActD_4h_SE_dPSI  DMSO_4h_SE-ActD_4h_SE_p-val
ENSG00000000003;SE:X:100630866-100632485:100632568-100633405:-  -0.0004079606   0.7715034091
ENSG00000000419;SE:20:50936262-50940865:50940933-50942031:- -0.0199044529   0.1000000000
ENSG00000000419;SE:20:50936262-50940865:50940955-50942031:- -0.0681225069   0.4800000000
ENSG00000000419;SE:20:50940933-50941105:50941209-50942031:- -0.0069754429   0.4800000000
ENSG00000000419;SE:20:50940933-50941129:50941209-50942031:- -0.0156970454   0.2666666667
ENSG00000000419;SE:20:50940955-50941105:50941209-50942031:- -0.0266339002   0.6549237453
ENSG00000000419;SE:20:50941209-50942031:50942126-50945737:- -0.0007721702   1.0000000000
ENSG00000000419;SE:20:50942126-50945737:50945762-50945847:- -0.0177982023   0.1000000000
ENSG00000000457;SE:1:169854964-169855796:169855957-169859041:-  -0.0031446603   0.8857142857

which is probably fine, but I'm suspicious about the name.

EduEyras commented 3 weeks ago

Thanks the temp.0 suffix makes me think that the process is not running in full. Can you see whether there is any problem with one of the entries?

MarekGierlinski commented 3 weeks ago

I cannot see any obvious problems with neither the input or output files. The output file I showed the head of (ActD_4h_SE.txt.dpsi.temp.0), contains 57342 lines and there is no corruption visible. Input files also look clean.

The only unusual thing I noticed is that PSI files contain rows with NAs and the corresponding rows in "psivec" and "dpsi" files contain NaNs. Here is an example:

> grep "ENSG00000293597;SE:1:169018002-169018130:169018303-169020943:-" psi/ActD_4h_SE.psi
ENSG00000293597;SE:1:169018002-169018130:169018303-169020943:-  NA      NA      NA      NA

 > grep "ENSG00000293597;SE:1:169018002-169018130:169018303-169020943:-" diff/ActD_4h_SE.txt.dpsi.temp.0 
ENSG00000293597;SE:1:169018002-169018130:169018303-169020943:-  nan     1.0000000000

 > grep "ENSG00000293597;SE:1:169018002-169018130:169018303-169020943:-" diff/ActD_4h_SE.txt.psivec 
ENSG00000293597;SE:1:169018002-169018130:169018303-169020943:-  nan     nan     nan     nan     nan     nan     nan     nan

In the GTF file I found that the exon starting at 1:169018130 and ending at 1:169018303 belongs to two transcripts, ENST00000715538 and ENST00000715540. None of the is present in the salmon output files. I'm guessing lack of Salmon input causes SUPPA to create NAs in the PSI file and, consequently, NaNs in the differential output. I don't know if this can cause any problems.

EduEyras commented 4 days ago

Hi, yes, if transcripts do not have a value, it is not possible to calculate a PSI, so an NA is produced. E