comprna / SUPPA

SUPPA: Fast quantification of splicing and differential splicing
MIT License
241 stars 60 forks source link

Issues reading transcript expression files #190

Open ryanpe13002 opened 2 weeks ago

ryanpe13002 commented 2 weeks ago

Hello,

I am trying to run psiPerEvent but the program isn't able to read through the expression file. The full expression file and error files are attached in Archive.zip. I ran the following code (note that I used cat to concatenate all the *.ioe files prior to running this step):

python3 ~/SUPPA-2.3/suppa.py psiPerEvent\
 -e JAX_LRseq.TumorVsNormal.TPM.txt\
 -i LUAD_v2.strict.ioe\
 -o LUAD_v2.psi

And the error file looks like this, with the same pattern continuing for each input line in the expression file:

INFO:lib.tools:File JAX_LRseq.TumorVsNormal.TPM.txt opened in reading mode.
INFO:psiCalculator:Buffering transcript expression levels.
ERROR:lib.tools:1, in line 2. Skipping line...
ERROR:lib.tools:2, in line 3. Skipping line...
ERROR:lib.tools:3, in line 4. Skipping line...
ERROR:lib.tools:4, in line 5. Skipping line...
ERROR:lib.tools:5, in line 6. Skipping line...
ERROR:lib.tools:6, in line 7. Skipping line...
ERROR:lib.tools:7, in line 8. Skipping line...
ERROR:lib.tools:8, in line 9. Skipping line...
ERROR:lib.tools:9, in line 10. Skipping line...

This is quite vexing, as I have ensured the expression file is tab-delimited and formatted appropriately, with no hidden characters as far as I can tell. What am I doing wrong?

Thanks so much, Ryan Englander

GS4, Jackson Laboratory for Genomic Medicine Anczukow Lab

EduEyras commented 2 weeks ago

Hi Ryan,

thanks for the message and I'm sorry that you're getting this error.

Some lines have IDs like "PB.1.16" instead of ENST.... is that correct?

The .zip did not include the .ioe file so I could not check, but I guess you checked that the transcript IDs in the ioe are the same (including versions)?

Did you get the code from conda or from the github. There might be some bugfixes in the github that we did not update yet in the conda version. The github version should work.

Other than that, I cannot think of anything. Please send me the .ioe and I can run it here to check.

cheers

Eduardo

On Mon, 29 Apr 2024 at 00:40, ryanpe13002 @.***> wrote:

Hello,

I am trying to run psiPerEvent but the program isn't able to read through the expression file. The full expression file and error files are attached in Archive.zip https://github.com/comprna/SUPPA/files/15142480/Archive.zip. I ran the following code (note that I used cat to concatenate all the *.ioe files prior to running this step):

python3 ~/SUPPA-2.3/suppa.py psiPerEvent\ -e JAX_LRseq.TumorVsNormal.TPM.txt\ -i LUAD_v2.strict.ioe\ -o LUAD_v2.psi

And the error file looks like this, with the same pattern continuing for each input line in the expression file:

INFO:lib.tools:File JAX_LRseq.TumorVsNormal.TPM.txt opened in reading mode. INFO:psiCalculator:Buffering transcript expression levels. ERROR:lib.tools:1, in line 2. Skipping line... ERROR:lib.tools:2, in line 3. Skipping line... ERROR:lib.tools:3, in line 4. Skipping line... ERROR:lib.tools:4, in line 5. Skipping line... ERROR:lib.tools:5, in line 6. Skipping line... ERROR:lib.tools:6, in line 7. Skipping line... ERROR:lib.tools:7, in line 8. Skipping line... ERROR:lib.tools:8, in line 9. Skipping line... ERROR:lib.tools:9, in line 10. Skipping line...

This is quite vexing, as I have ensured the expression file is tab-delimited and formatted appropriately, with no hidden characters as far as I can tell. What am I doing wrong?

Thanks so much, Ryan Englander

GS4, Jackson Laboratory for Genomic Medicine Anczukow Lab

— Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/190, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKB2LAB3U47LVBCVDAFDY7UC5FAVCNFSM6AAAAABG5BGCUKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGI3DONRQHEYTENI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

ryanpe13002 commented 2 weeks ago

Thanks so much Eduardo! The GTF I am using here is a hybrid long read-derived GTF (hence PB for PacBio) which is concatenated with GENCODE v44.

They are all arranged in my GTF in the same format. I used the same GTF to get the expression matrix as I did to get the ioe file, so there shouldn't be any possibility of version mismatch or anything of that nature.

I got the code from GitHub using the following link: https://github.com/comprna/SUPPA/releases/tag/v2.3

The compressed IOE file is here: LUAD_v2.strict.ioe.zip

Thanks so much for your help, I really appreciate it!

Kindest regards, Ryan

ryanpe13002 commented 2 weeks ago

Hey, just a heads up, I think I figured it out - I had a trailing tab in the header row, which fixed the problem after I removed it. Thanks so much!!!

Kindest regards, Ryan