comprna / SUPPA

SUPPA: Fast quantification of splicing and differential splicing
MIT License
262 stars 62 forks source link

psiPerIsoform resulting in empty .psi file #161

Open TinyTasy opened 1 year ago

TinyTasy commented 1 year ago

Dear SUPPA team,

Thank you so much for your amazing tool. It is really helpful for differential isoform analysis.

I am trying to use SUPPA on pacbio single-cell isoseq data. I aligned my data with pbbm2 and used pigeon (SQANTI-based) to obtain a .gff file. Using gffread, I converted my .gff file into a .gtf file. My gtf file looks like this:

Screenshot from 2023-04-12 11-54-45

Thus, having the pb gene and transcript ID as the 9th column in the gtf file.

My expression file is a tab-seperated (.tsv) file consists of 268 samples (pseudobulks) and looks like this:

Screenshot from 2023-04-12 11-56-32

If I now execute this command:

python3.4 /vol/projects/agrinko/TREM2_7_03_2022/SUPPA-2.3/suppa.py psiPerIsoform \ -g /vol/projects/agrinko/TREM2_7_03_2022/data/Trem2_Longread/pacbio_TREM2.gtf \ -e /vol/projects/agrinko/TREM2_7_03_2022/data/Trem2_Longread/pseudobulk_without_rownames.tsv \ -o /vol/projects/agrinko/TREM2_7_03_2022/data/Trem2_Longread/psiPerIsoform_output

I get this warning for each transcript:

INFO:psiPerGene:Reading GTF data. INFO:psiPerGene:Reading Expression data. INFO:psiPerGene:Calculating inclusion and generating output. INFO:lib.tools:Expression for transcript "PB.104659.2" not found. Ignoring it in calculation. INFO:lib.tools:Expression for transcript "PB.104659.16" not found. Ignoring it in calculation. INFO:lib.tools:Expression for transcript "PB.98879.2" not found. Ignoring it in calculation. INFO:lib.tools:Expression for transcript "PB.98879.3" not found. Ignoring it in calculation. . . .

And my .psi output file is empty, only the sample names are persisting.

I already tried multiple things, such as testing tab seperated .txt files and .tsv files. I also already used the transcripts as rownames.

Do you have any idea what could be the issue? Any help is greatly appreciated.

Sincerely, Tasy

EduEyras commented 1 year ago

Hi Tasy,

Thanks for your email.

Perhaps the transcript IDs in your GTF and in your expression file are different? They look different in your screen captures. SUPPA would not be able to match them.

The expression file should have the transcript ID without the " ".

Also, the GTF format uses " " for transcript and gene IDs: see e.g. https://asia.ensembl.org/info/website/upload/gff.html

Please let me know if that would fix it best

Eduardo

On Wed, 12 Apr 2023 at 20:04, TinyTasy @.***> wrote:

Dear SUPPA team,

Thank you so much for your amazing tool. It is really helpful for differential isoform analysis.

I am trying to use SUPPA on pacbio single-cell isoseq data. I aligned my data with pbbm2 and used pigeon (SQANTI-based) to obtain a .gff file. Using gffread, I converted my .gff file into a .gtf file. My gtf file looks like this:

[image: Screenshot from 2023-04-12 11-54-45] https://user-images.githubusercontent.com/118251413/231423185-3e586c04-d231-45c4-9a25-3c1c59ba445d.png

Thus, having the pb gene and transcript ID as the 9th column in the gtf file.

My expression file is a tab-seperated (.tsv) file consists of 268 samples (pseudobulks) and looks like this:

[image: Screenshot from 2023-04-12 11-56-32] https://user-images.githubusercontent.com/118251413/231423691-97b5e60f-9440-410f-9747-1df66fa303b6.png

If I now execute this command:

python3.4 /vol/projects/agrinko/TREM2_7_03_2022/SUPPA-2.3/suppa.py psiPerIsoform -g /vol/projects/agrinko/TREM2_7_03_2022/data/Trem2_Longread/pacbio_TREM2.gtf -e /vol/projects/agrinko/TREM2_7_03_2022/data/Trem2_Longread/pseudobulk_without_rownames.tsv

-o /vol/projects/agrinko/TREM2_7_03_2022/data/Trem2_Longread/psiPerIsoform_output

I get this warning for each transcript:

INFO:psiPerGene:Reading GTF data. INFO:psiPerGene:Reading Expression data. INFO:psiPerGene:Calculating inclusion and generating output. INFO:lib.tools:Expression for transcript "PB.104659.2" not found. Ignoring it in calculation. INFO:lib.tools:Expression for transcript "PB.104659.16" not found. Ignoring it in calculation. INFO:lib.tools:Expression for transcript "PB.98879.2" not found. Ignoring it in calculation. INFO:lib.tools:Expression for transcript "PB.98879.3" not found. Ignoring it in calculation. . . .

And my .psi output file is empty, only the sample names are persisting.

I already tried multiple things, such as testing tab seperated .txt files and .tsv files. I also already used the transcripts as rownames.

Do you have any idea what could be the issue? Any help is greatly appreciated.

Sincerely, Tasy

— Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/161, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKB2KYGOEICYQCXUUX73XAZ43RANCNFSM6AAAAAAW3OQLLU . You are receiving this because you are subscribed to this thread.Message ID: @.***>

TinyTasy commented 1 year ago

Hello Eduardo!

Thank you for your quick reply. It's almost embarassing, but yes, the error laid in the expression file, I indeed only had to remove the " ".

I am grateful for your help, thank you!

Sincerely, Tasy