Closed marypiper closed 1 year ago
Hi Mary,
Thanks for your email. You probably used the "classical" test method? It's certainly unusual to see p-value < 0.05 for zero deltaPSI, but it could be that you have too much power in your comparison? It must be something that the Wilcoxon test is able to produce for some reason.
If you used the empirical method, it must be a bug, because it is impossible to get an empirical value <1 with no change in PSI.
Performing multiple test correction will likely remove these cases.
In any case, we always advocate using |deltaPSI| and p-value, not p-value alone, to determine the relevant changes. I would not consider zero or small deltaPSI changes to be relevant, despite the small p-value.
I hope this helps
Eduardo
On Tue, 3 May 2022 at 01:58, marypiper @.***> wrote:
Hi, I would like to thank you for your continued support and effort in the development of suppa2. I recently performed an analysis between conditions (10 replicates per condition), and I had a question regarding the output. The significant output (p.val < 0.05) of some events from the diffSplice command exhibited a dpsi of 0. I realize that I could use a lower threshold for dpsi with the option --lower-bound, but it concerns me a bit that without the option, I am detecting significant events that have dpsi of 0 (see attached image). Could you comment on this behavior - is there any situation that you would expect a 0 dpsi to be significant? Sorry if I have missed this in the documentation.
Thank you for your time and help, Mary
diffSplice command: suppa.py diffSplice -m empirical -gc -i M19_all_events.ioe -p control_events.psi treatment_events.psi -e control_iso.tpm treatment_iso.tpm -o diffSplice_local_results
[image: Screen Shot 2022-05-02 at 11 20 48 AM] https://user-images.githubusercontent.com/7414912/166264204-0767f711-9fd2-4db9-a92e-6d064481f3c2.png
— Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/144, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKB4VGDXBC5RDIWXONP3VH73R7ANCNFSM5U4MHGAA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
-- Prof. E Eyras EMBL Australia Group Leader The John Curtin School of Medical Research - Australian National University https://github.com/comprna http://scholar.google.com/citations?user=LiojlGoAAAAJ
Hi Eduardo,
Thanks for your quick response! I used the empirical method with multiple test correction (using the -gc option). The diffSplice command I used is:
suppa.py diffSplice -m empirical -gc -i M19_all_events.ioe -p control_events.psi treatment_events.psi -e control_iso.tpm treatment_iso.tpm -o diffSplice_local_results
Best, Mary
Thanks Mary,
it is surprising that it can give a p-value < 1 with zero delta PSI
The empirical method works by testing your observed delta PSI against the distribution of expected (built from the variability of your experiments) at a similar overall expression level (obtained from the TPM file).
So it must be that it has an empty control set but it does not complain about, or that the -gc flag causes some artifact. DId you try not using -gc to see whether they still appear?
In any case, it makes sense to always require a minimum deltaPSI to consider the events relevant.
I hope this helps
E.
On Thu, 5 May 2022 at 05:01, marypiper @.***> wrote:
Hi Eduardo,
Thanks for your quick response! I used the empirical method with multiple test correction (using the -gc option). The diffSplice command I used is:
suppa.py diffSplice -m empirical -gc -i M19_all_events.ioe -p control_events.psi treatment_events.psi -e control_iso.tpm treatment_iso.tpm -o diffSplice_local_results
Best, Mary
— Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/144#issuecomment-1117697722, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKB56T3NCNJGKGIAWZ2TVILCPPANCNFSM5U4MHGAA . You are receiving this because you commented.Message ID: @.***>
Hi Eduardo,
Thank you for your response! I spent time looking into the datasets again. Using the -gc flag still exhibited the same behavior. In addition, I found similar results whether I looked at local events or Isoform usage. I have 10 replicates per condition and each of the iso_tpm and psi files have the expected number of columns and data.
The only other issue I had was that the GTF file did not exactly match the GTF used for mapping due to a core performing the mapping and the downloadable GTF not compatible with Suppa2. I downloaded the equivalent GTF patch (GRCm38.p6) from Ensembl - it worked, but I had some transcripts that were not present in my expression file and generated with the following errors for the calculation of PSI:
ERROR:psiCalculator:PSI not calculated for event ENSMUSG00000024942;AF:19:6014468-6015106:6015206:6014468-6015736:6015897:-. ERROR:psiCalculator:transcript ENSMUST00000236798 not found in the "expression file".
However, it still performed the calculation, and output PSI values were determined for ~78,500 events and 139,585 isoforms. The expression file had 142,434 isoforms present, so not a huge loss of isoforms with conversion between the GTF files.
Finally, I also tried running classical, but the code errored out with the following error:
ERROR:main:Unknown error: (<class 'TypeError'>, TypeError("calculate_delta_psi() missing 1 required positional argument: 'nan_th'",). I saw other issues where people have had this error too.
I did run with the lower threshold designated and that completed, but I would prefer to understand the problem behind the significant dPSI of 0 before moving forward with the workflow. The transcripts that I looked at with dPSI zero tended to have low expression levels (0.001-1 tpm, generally).
I hope providing this information might help - please let me know if there is anything else I could provide that would be useful to you. TPM and PSI for the dPSI of zero transcripts/events?
Best wishes, Mary
Thanks a lot for your feedback
We have pending looking into this.
The calculation is sensitive to missing values, so using -nan may help with some of the problems.
As these cases, similarly to the dPSI = 0, are related to low expression, they are mostly unreliable.
We'll look into how to control for these cases and have a better error handling
Thanks a lot
E.
On Fri, 3 Jun 2022 at 06:28, pipes82 @.***> wrote:
Hi Eduardo,
Thank you for your response! I spent time looking into the datasets again. Using the -gc flag still exhibited the same behavior. In addition, I found similar results whether I looked at local events or Isoform usage. I have 10 replicates per condition and each of the iso_tpm and psi files have the expected number of columns and data.
The only other issue I had was that the GTF file did not exactly match the GTF used for mapping due to a core performing the mapping and the downloadable GTF not compatible with Suppa2. I downloaded the equivalent GTF patch (GRCm38.p6) from Ensembl - it worked, but I had some transcripts that were not present in my expression file and generated with the following errors for the calculation of PSI:
ERROR:psiCalculator:PSI not calculated for event ENSMUSG00000024942;AF:19:6014468-6015106:6015206:6014468-6015736:6015897:-. ERROR:psiCalculator:transcript ENSMUST00000236798 not found in the "expression file".
However, it still performed the calculation, and output PSI values were determined for ~78,500 events and 139,585 isoforms. The expression file had 142,434 isoforms present, so not a huge loss of isoforms with conversion between the GTF files.
Finally, I also tried running classical, but the code errored out with the following error:
ERROR:main:Unknown error: (<class 'TypeError'>, TypeError("calculate_delta_psi() missing 1 required positional argument: 'nan_th'",). I saw other issues where people have had this error too.
I did run with the lower threshold designated and that completed, but I would prefer to understand the problem behind the significant dPSI of 0 before moving forward with the workflow. The transcripts that I looked at with dPSI zero tended to have low expression levels (0.001-1 tpm, generally).
I hope providing this information might help - please let me know if there is anything else I could provide that would be useful to you. TPM and PSI for the dPSI of zero transcripts/events?
Best wishes, Mary
— Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/144#issuecomment-1145310117, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKBYV6IYUAJS54OHOGH3VNEKNXANCNFSM5U4MHGAA . You are receiving this because you commented.Message ID: @.***>
Hi, I would like to thank you for your continued support and effort in the development of suppa2. I recently performed an analysis between conditions (10 replicates per condition), and I had a question regarding the output. The significant output (p.val < 0.05) of some events from the diffSplice command exhibited a dpsi of 0. I realize that I could use a lower threshold for dpsi with the option --lower-bound, but it concerns me a bit that without the option, I am detecting significant events that have dpsi of 0 (see attached image). Could you comment on this behavior - is there any situation that you would expect a 0 dpsi to be significant? Sorry if I have missed this in the documentation.
Thank you for your time and help, Mary
diffSplice command: suppa.py diffSplice -m empirical -gc -i M19_all_events.ioe -p control_events.psi treatment_events.psi -e control_iso.tpm treatment_iso.tpm -o diffSplice_local_results
The screenshot is an example, but quite a few of the events had dpsi of zero (16/68 significant results.