comprna / SUPPA

SUPPA: Fast quantification of splicing and differential splicing
MIT License
263 stars 62 forks source link

Calculate differential splicing events #165

Closed SSaleem94 closed 1 year ago

SSaleem94 commented 1 year ago

Hi ,

I am having issues with calculating the differential splicing. I have tried several times but kept getting the same error message, even though all files required are uploaded.

The error message is below:

Traceback (most recent call last): File "/mnt/storage/nobackup/b7070855/april2023/SUPPA/suppa.py", line 10, in import psiPerGene as psiPerIsoform File "/mnt/storage/nobackup/b7070855/april2023/SUPPA/psiPerGene.py", line 13, in from lib.tools import ModuleNotFoundError: No module named 'lib.tools' [1] "Parsing samples..." [1] "Loading ./results/iso_tpm_formatted.txt..." Error: first_condition %in% colnames(input_file) are not all TRUE Execution halted [1] "Parsing samples..." [1] "Loading ./results/events.psi..." Error in file(file, "rt") : cannot open the connection Calls: read.table -> file In addition: Warning message: In file(file, "rt") : cannot open file './results/events.psi': No such file or directory Execution halted Traceback (most recent call last): File "/mnt/storage/nobackup/b7070855/april2023/SUPPA/suppa.py", line 10, in import psiPerGene as psiPerIsoform File "/mnt/storage/nobackup/b7070855/april2023/SUPPA/psiPerGene.py", line 13, in from lib.tools import ModuleNotFoundError: No module named 'lib.tools' CalculateDifferentialSplicingEvents.sh: line 24: -e: command not found

Could you please help me with this **Thank you in advance

EduEyras commented 1 year ago

Hi,

thanks for your email.

Could you please send me your command line?

The error may be related to suppa.py not finding the libraries. Are they installed and visible to the command?

Thanks

E.

On Wed, 28 Jun 2023 at 06:27, SARA @.***> wrote:

Hi ,

I am having issues with calculating the differential splicing. I have tried several times but kept getting the same error message, even though all files required are uploaded.

The error message is below:

Traceback (most recent call last): File "/mnt/storage/nobackup/b7070855/april2023/SUPPA/suppa.py", line 10, in import psiPerGene as psiPerIsoform File "/mnt/storage/nobackup/b7070855/april2023/SUPPA/psiPerGene.py", line 13, in from lib.tools import ModuleNotFoundError: No module named 'lib.tools' [1] "Parsing samples..." [1] "Loading ./results/iso_tpm_formatted.txt..." Error: first_condition %in% colnames(input_file) are not all TRUE Execution halted [1] "Parsing samples..." [1] "Loading ./results/events.psi..." Error in file(file, "rt") : cannot open the connection Calls: read.table -> file In addition: Warning message: In file(file, "rt") : cannot open file './results/events.psi': No such file or directory Execution halted Traceback (most recent call last): File "/mnt/storage/nobackup/b7070855/april2023/SUPPA/suppa.py", line 10, in import psiPerGene as psiPerIsoform File "/mnt/storage/nobackup/b7070855/april2023/SUPPA/psiPerGene.py", line 13, in from lib.tools import ModuleNotFoundError: No module named 'lib.tools' CalculateDifferentialSplicingEvents.sh: line 24: -e: command not found

Could you please help me with this **Thank you in advance

— Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/165, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKB53RBL3TP2SDIMLSHDXNM6Z5ANCNFSM6AAAAAAZWDKYGU . You are receiving this because you are subscribed to this thread.Message ID: @.***>

SSaleem94 commented 1 year ago

Hi, Thank you for your response, This is the script used for this step:

!/bin/bash

Create folder for results

outd="./results" if [ ! -e $outd ] ;then mkdir -p $outd fi

sample lists for diff splicing

ct_samples="1,5,9" R1881_samples="2,6,10"

Create TPM file

python multipleFieldSelection.py -i ./*/quant.sf -k 1 -f 4 -o $outd/iso_tpm.txt

Edited version

Rscript ./format_Ensembl_ids.R $outd/iso_tpm.txt

have already generated ioe files

ioe_file="./genomedata/ensembl_hg38.109.events.ioe"

calculate psi for all samples

python suppa.py psiPerEvent -i $ioe_file -e $outd/iso_tpm_formatted.txt -o $outd/events

split tpm and psi tables for samples

Rscript ./split_file.R $outd/iso_tpm_formatted.txt $ct_samples $R1881_samples $outd/ct_iso.tpm $outd/R1881_iso.tpm -i Rscript ./split_file.R $outd/events.psi $ct_samples $R1881_samples $outd/ct_events.psi $outd/R1881_events.psi -e

diff splicing analysis

python suppa.py diffSplice -m empirical -gc -i $ioe_file \ -p $outd/R1881_events.psi $outd/ct_events.psi \ -e $outd/R1881_iso.tpm $outd/ct_iso.tpm -o $outd/diffSplice-events

The required files are all uploaded in the same directory where I ran the scripts. Here is a list of the files which are uploaded at first and the files resulted from the previous steps :

eventGenerator.py
gencode_v43_index
Homo_sapiens.GRCh38.109.gtf
psiPerGene.py
significanceCalculator.py fileMerger.py
gencode.v43.transcripts.fa
multipleFieldSelection.py
pycache
split_file.R genomedata
package-list
suppa_analyses.R eventClusterer.py
format_Ensembl_ids.R genomedata_ioi
psiCalculator.py
suppa.py 1 2 3 4 5 6 7 8 9 10 11 12 AnnotationForSuppaEvent.sh AnnotationSuppaIsoform.sh CalculateDifferentialSplicingEvents.sh quantifypairedend.sh

_the numbers refer to the sample folders

Many thanks_

EduEyras commented 1 year ago

Thanks

Do you have a copy of the lib/ directory as well?

E.

On Thu, 29 Jun 2023 at 00:30, SARA @.***> wrote:

Hi, Thank you for your response, This is the script used for this step:

!/bin/bash

Create folder for results

outd="./results" if [ ! -e $outd ] ;then mkdir -p $outd fi sample lists for diff splicing

ct_samples="1,5,9" R1881_samples="2,6,10" Create TPM file

python multipleFieldSelection.py -i ./*/quant.sf -k 1 -f 4 -o $outd/iso_tpm.txt Edited version

Rscript ./format_Ensembl_ids.R $outd/iso_tpm.txt have already generated ioe files

ioe_file="./genomedata/ensembl_hg38.109.events.ioe" calculate psi for all samples

python suppa.py psiPerEvent -i $ioe_file -e $outd/iso_tpm_formatted.txt -o $outd/events split tpm and psi tables for samples

Rscript ./split_file.R $outd/iso_tpm_formatted.txt $ct_samples $R1881_samples $outd/ct_iso.tpm $outd/R1881_iso.tpm -i Rscript ./split_file.R $outd/events.psi $ct_samples $R1881_samples $outd/ct_events.psi $outd/R1881_events.psi -e diff splicing analysis

python suppa.py diffSplice -m empirical -gc -i $ioe_file -p $outd/R1881_events.psi $outd/ct_events.psi -e $outd/R1881_iso.tpm $outd/ct_iso.tpm -o $outd/diffSplice-events

The required files are all uploaded in the same directory where I ran the scripts. Here is a list of the files which are uploaded at first and the files resulted from the previous steps :

eventGenerator.py gencode_v43_index Homo_sapiens.GRCh38.109.gtf psiPerGene.py significanceCalculator.py fileMerger.py gencode.v43.transcripts.fa multipleFieldSelection.py pycache split_file.R genomedata package-list suppa_analyses.R eventClusterer.py format_Ensembl_ids.R genomedata_ioi psiCalculator.py suppa.py 1 2 3 4 5 6 7 8 9 10 11 12 AnnotationForSuppaEvent.sh AnnotationSuppaIsoform.sh CalculateDifferentialSplicingEvents.sh quantifypairedend.sh the numbers refer to the sample folders

Many thanks

— Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/165#issuecomment-1611530130, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKBYFL3MB4MSNCC67MHLXNQ5YFANCNFSM6AAAAAAZWDKYGU . You are receiving this because you commented.Message ID: @.***>

SSaleem94 commented 1 year ago

image Please open the attachment for the lib/directory.

Many thanks

EduEyras commented 1 year ago

Hi, there should be a lib directory as part of the suppa code with some function definitions One of the error lines you sent seemed to indicate that this was not found, so perhaps that directory was not copied? Did you make a clone of the repository or downloaded the code, or something else? Thanks E.

On Fri, 30 Jun 2023 at 00:18, SARA @.***> wrote:

[image: image] https://user-images.githubusercontent.com/132380335/249798376-d23999e2-6162-47aa-a638-13d138914fc8.png Please open the attachment for the lib/directory.

Many thanks

— Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/165#issuecomment-1613265006, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKBZWISPOKUBLGQXD5LTXNWFDJANCNFSM6AAAAAAZWDKYGU . You are receiving this because you commented.Message ID: @.***>

SSaleem94 commented 1 year ago

Thanks for your response ,

I did not understand fully what you mean by ( make a clone of the repository or downloaded the code), could you please explain further.

Thanks

SSaleem94 commented 1 year ago
zjanna commented 1 year ago

Dear developers, first of all, thank you for the SUPPA tool, which will be a great help in our research.

I would be very grateful if you could answer my question regarding the appropriate setting of conditions (Cond1 and Cond2) when analysing RNA-seq data.

I understand that dPSI = PSI_Cond2 - PSI_Cond1, but I am not sure what to set as Cond1 and Cond2 when dealing with the experiment where I have Case (cells overexpressing the protein under investigation) and Control (WT, wild type cells without overexpression).

Please, I would be really grateful if you could clear up our doubts.

EduEyras commented 1 year ago

Hi,

It does not matter what order you choose for the analysis. The order will only be relevant for the interpretation. But it is entirely symmetric.

For instance, let's assume there is an excess of SE events with deltaPSI >

  1. This means their PSI value is higher in cond2. If cond2 was the overexpression condition, it means that the overexpression is associated with higher inclusion. If cond2 was the WT, hence cond1 is the overexpression, and the overexpression is associated with a reduction of inclusion of those exons.

So you can choose whichever order makes it easier for you to interpret the results.

However, it is often that people use as cond2 the "modified condition", in this case the overexpression, so that changes are described relative to the "non-modifed condition" or WT.

I hope it helps.

Best

Eduardo

On Mon, 7 Aug 2023 at 19:29, zjanna @.***> wrote:

Dear developers, first of all, thank you for the SUPPA tool, which will be a great help in our research.

I would be very grateful if you could answer my question regarding the appropriate setting of conditions (Cond1 and Cond2) when analysing RNA-seq data.

I understand that dPSI = PSI_Cond2 - PSI_Cond1, but I am not sure what to set as Cond1 and Cond2 when dealing with the experiment where I have Case (cells overexpressing the protein under investigation) and Control (WT, wild type cells without overexpression).

Please, I would be really grateful if you could clear up our doubts.

— Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/165#issuecomment-1667517121, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKB7VGL464L5UKY2LE7LXUCYQHANCNFSM6AAAAAAZWDKYGU . You are receiving this because you commented.Message ID: @.***>

AtefehB2 commented 1 year ago
Screenshot 2023-08-22 at 2 26 49 PM

I have done diffsplicing and I got the result. I am curios about all of the number in column one(ID). what are they? I know the first two are genomic coordinates that likely represent the region where the alternative splicing event takes place. but what about highlighted one.

EduEyras commented 1 year ago

Please have a look at Fig. 3 at https://github.com/comprna/SUPPA

An explanation of these coordinates is given.

Each event type is defined by a different set of coordinates, i.e. start and end positions of exons involved in the event, so all those coordinates are described in that figure.

Please let me know if that clarifies it

Best

Eduardo

On Wed, 23 Aug 2023 at 04:34, AtefehB2 @.***> wrote:

[image: Screenshot 2023-08-22 at 2 26 49 PM] https://user-images.githubusercontent.com/142933188/262448176-4d809a3b-9602-46ef-b6be-97b5d422414c.png

I have done diffsplicing and I got the result. I am curios about all of the number in column one(ID). what are they? I know the first two are genomic coordinates that likely represent the region where the alternative splicing event takes place. but what about highlighted one.

— Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/165#issuecomment-1688713955, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKB45CHSYSN6FAH6BMFLXWT3SBANCNFSM6AAAAAAZWDKYGU . You are receiving this because you commented.Message ID: @.***>