Ppadjust is always 1 - Githubissues

gremame commented 3 years ago

Hello everyone, I have been familiarizing myself with FRASER for a week now. Read the article and the vignette, great tool in general. Congrats! The design I have been trying to use for your tool is to have a number of "control" samples with no apparent events detected and a "study" sample to be contrasted. For this initial test, this sample has a confirmed aberrant splicing event that truncates the transcript. I don't know if this design fits in the FRASER use cases. Does it? The FraserDataSet looks like this:

   sampleID   group bamFile                                       pairedEnd
   <fct>      <dbl> <chr>                                         <lgl>    
 1 control-1     1 control-1.Aligned.out.sorted.q11_q21.bam TRUE     
 2 control-2     2 control-2.Aligned.out.sorted.q11_q21.bam TRUE     
 3 control-3     3 control-3.Aligned.out.sorted.q11_q21.bam TRUE     
 4 control-4     4 control-4.Aligned.out.sorted.q11_q21.bam TRUE     
 5 control-5     5 control-5.Aligned.out.sorted.q11_q21.bam TRUE     
 6 control-6     6 control-6.Aligned.out.sorted.q11_q21.bam TRUE     
 7 control-7     7 control-7.Aligned.out.sorted.q11_q21.bam TRUE     
 8 control-8     8 control-8.Aligned.out.sorted.q11_q21.bam TRUE     
 9 control-9     9 control-9.Aligned.out.sorted.q11_q21.bam TRUE     
10 control-10    10 control-10.Aligned.out.sorted.q11_q21.bam TRUE     
11 study      11 study.Aligned.out.sorted.q11_q21.bam TRUE     

----------------------- Settings -----------------------
Analysis name:               Data Analysis 
Analysis is strand specific: reverse 
Working directory:           './temp/study' 

-------------------- BAM parameters --------------------
class: ScanBamParam
bamFlag (NA unless specified):
bamSimpleCigar: FALSE
bamReverseComplement: FALSE
bamTag:  
bamTagFilter:
bamWhich: 0 ranges
bamWhat:
bamMapqFilter: 20

I would like to know how to tell Fraser how to group samples? I have seen in the vignette sample grouping is performed based on the group column and the condition column in the input sample table. However, even though I have been playing around with these two categories I haven't seen differences in the obtained results. Using both column names and using different grouping designs, for example all control samples as group 1 and study sample as group 2, etc. I have been able to extract the aberrant event from the study sample by using the zScore > abs(2.5) and psiValue > 0.6 columns. However, filtering by padjust returns an empty table. That is actually my major concern. I haven't been able to obtain a value for the padjust different from 1. This actually happens to all events regardless of the sample. Is it related to the grouping? I need more samples in the analysis? Is there a parameter that could decrease the statistic stringency? What would you suggest? Thanks in advance! David

c-mertes commented 2 years ago

Dear David, I'm sorry that I did not respond to your questions. Not sure if it is still helpful for you, but at least for future users, this could be helpful.

So it is fine to have control samples along with a single case. As long as the protocol and tissue is not different. The problem I see here is rather the sample size. with 11 samples it is very hard to get anything statistically significant. Hence increasing the number of samples would be the best thing here. You could also try to use GTEx as another source of samples or other similar samples. We suggest having at least 30 samples to get significant hits.

The group and condition part is actually not important currently. Sorry for the confusion. We have to take it out of the vignette.

c-mertes commented 2 years ago

Please reopen it if you have more questions on this.

canankolakoglu commented 2 years ago

Hello Christian, Thank you for your informative answer. So I also have a question about grouping. If the grouping and the condition part is not important while analysis and calculation of the psi and theta metrics, then we need to have the sample table with more control data then case data. I mean, initially I thought that the FRASER calculates the psi and theta metrics for controls and case data separately and compares them. But if it do not use grouping information, it will calculates metrics using all data. And if we have more case data -that may have same abnormal splice site- than control data, then the FRASER will not be able to prioritize this abnormal splice site because of its high prevalence in the data. Is my inference correct or can I use more case data then control data to analysis? Thank you!

c-mertes commented 2 years ago

Dear @canankolakoglu, in your comment, you raise a very important point and it boils down to your final research questions and assumptions.

FRASER attempts to find aberrant events within the data regardless of control or case status. So if you assume that each case has a different event FRASER is the right tool. From experiences, if we have 5% of samples with the same aberrant event, FRASER can still pick it up within a cohort of 50 samples and more even if all are cases. Of course, each additional sample with the same event will have its impact on the statistical power resulting in higher P values for that given event as you correctly state in your comment.

But if you rather assume that all cases have the same event and you want to do a case vs control comparison aka differential splicing analysis then you should rather apply Leafcutter or similar tools instead of FRASER.

Hope this gives you a better understanding of the tool and the way you can apply FRASER. This slide deck might help you to understand the differences. Keep in mind that an outlier event is not required to be in only one sample.

gremame commented 2 years ago

Thanks @c-mertes your comments are definitely useful. I'll try FRASER with a larger sample number.

gagneurlab / FRASER

Ppadjust is always 1 #26