Open KuechlerO opened 2 years ago
If I recall correctly, the confounder effect is estimated from fitting the parameters of the dirichlet multinomial: e.g. x_i beta_j + confounder beta_confounder + u_j below. It is quite possible that the fit is not great on your data, partly because you don't have a lot of data. I would trust your instinct and run these separately (if you think that confounder effect is big) and then do a meta-analysis (based on your two separate tests), or combine them without confounders if you think that confounder effect is small.
Yang
[image: image.png]
On Wed, Mar 16, 2022 at 10:24 AM Oliver Küchler @.***> wrote:
Hey guys,
firstly huge thx for your work! It's really cool and fun to use!
However, I stumbled over a case, where the inclusion of an additional confounder column in the Leafcutter-group-file makes a huge difference. For me (I'm using a positive control), the results for the case without the additional confounder column seem reasonable, while the output for the case with confounders varies broadly from the first case and does not include my positive control.
Concrete results: Case1 without confounders cluster_x has q-value: 0.000147 Total nr of significant clusters (FDR <0.1): 35
Case2 with confounders cluster_x has q-value: 0.85 Total nr of significant clusters (FDR <0.1): 255
Here are my group-files. Case without confounders
76660.bam control 76661.bam control 76662.bam control AW.bam control BF.bam control UK.bam control 76663.bam mutation 76664.bam mutation 76665.bam mutation 18-5085.bam mutation 18-5148.bam mutation
and Case without confounders
76660.bam control SeqData1 76661.bam control SeqData1 76662.bam control SeqData1 AW.bam control SeqData2 BF.bam control SeqData2 UK.bam control SeqData2 76663.bam mutation SeqData1 76664.bam mutation SeqData1 76665.bam mutation SeqData1 18-5085.bam mutation SeqData2 18-5148.bam mutation SeqData2
result in completely different outcomes.
P.S. Also splitting the run based on the confounders supports Case1:
SeqData1: cluster_x has q-value: 0.071 Total nr of significant clusters (FDR < 0.1): 67
SeqData2: cluster_x has q-value: 0.0434 Total nr of significant clusters (FDR < 0.1): 465
Could you maybe elaborate, how the confounders are included and affect the calculation of Leafcutter's endresults?
— Reply to this email directly, view it on GitHub https://github.com/davidaknowles/leafcutter/issues/208, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGWTCP6ZVF46ELTB6OBBXLVAH4KVANCNFSM5Q4G2X7Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hey guys,
firstly huge thx for your work! It's really cool and fun to use!
However, I stumbled over a case, where the inclusion of an additional confounder column in the Leafcutter-group-file makes a huge difference. For me (I'm using a positive control), the results for the case without the additional confounder column seem reasonable, while the output for the case with confounders varies broadly from the first case and does not include my positive control.
Concrete results: Case1 without confounders cluster_x has q-value: 0.000147 Total nr of significant clusters (FDR <0.1): 35
Case2 with confounders cluster_x has q-value: 0.85 Total nr of significant clusters (FDR <0.1): 255
Here are my group-files. Case without confounders
and Case without confounders
result in completely different outcomes.
P.S. Also splitting the run based on the confounders supports Case1:
SeqData1: cluster_x has q-value: 0.071 Total nr of significant clusters (FDR < 0.1): 67
SeqData2: cluster_x has q-value: 0.0434 Total nr of significant clusters (FDR < 0.1): 465
Could you maybe elaborate, how the confounders are included and affect the calculation of Leafcutter's endresults?