davidaknowles / leafcutter

Annotation-free quantification of RNA splicing. Yang I. Li, David A. Knowles, Jack Humphrey, Alvaro N. Barbeira, Scott P. Dickinson, Hae Kyung Im, Jonathan K. Pritchard
http://davidaknowles.github.io/leafcutter/
Apache License 2.0
200 stars 111 forks source link

Very different result with included confounder.column #208

Open KuechlerO opened 2 years ago

KuechlerO commented 2 years ago

Hey guys,

firstly huge thx for your work! It's really cool and fun to use!

However, I stumbled over a case, where the inclusion of an additional confounder column in the Leafcutter-group-file makes a huge difference. For me (I'm using a positive control), the results for the case without the additional confounder column seem reasonable, while the output for the case with confounders varies broadly from the first case and does not include my positive control.

Concrete results: Case1 without confounders cluster_x has q-value: 0.000147 Total nr of significant clusters (FDR <0.1): 35

Case2 with confounders cluster_x has q-value: 0.85 Total nr of significant clusters (FDR <0.1): 255

Here are my group-files. Case without confounders

76660.bam   control
76661.bam   control
76662.bam   control
AW.bam  control
BF.bam  control
UK.bam  control
76663.bam   mutation
76664.bam   mutation
76665.bam   mutation
18-5085.bam mutation
18-5148.bam mutation

and Case without confounders

76660.bam   control SeqData1
76661.bam   control SeqData1
76662.bam   control SeqData1
AW.bam  control SeqData2
BF.bam  control SeqData2
UK.bam  control SeqData2
76663.bam   mutation    SeqData1
76664.bam   mutation    SeqData1
76665.bam   mutation    SeqData1
18-5085.bam mutation    SeqData2
18-5148.bam mutation    SeqData2

result in completely different outcomes.

P.S. Also splitting the run based on the confounders supports Case1:

SeqData1: cluster_x has q-value: 0.071 Total nr of significant clusters (FDR < 0.1): 67

SeqData2: cluster_x has q-value: 0.0434 Total nr of significant clusters (FDR < 0.1): 465

Could you maybe elaborate, how the confounders are included and affect the calculation of Leafcutter's endresults?

goldenflaw commented 1 year ago

If I recall correctly, the confounder effect is estimated from fitting the parameters of the dirichlet multinomial: e.g. x_i beta_j + confounder beta_confounder + u_j below. It is quite possible that the fit is not great on your data, partly because you don't have a lot of data. I would trust your instinct and run these separately (if you think that confounder effect is big) and then do a meta-analysis (based on your two separate tests), or combine them without confounders if you think that confounder effect is small.

Yang

[image: image.png]

On Wed, Mar 16, 2022 at 10:24 AM Oliver Küchler @.***> wrote:

Hey guys,

firstly huge thx for your work! It's really cool and fun to use!

However, I stumbled over a case, where the inclusion of an additional confounder column in the Leafcutter-group-file makes a huge difference. For me (I'm using a positive control), the results for the case without the additional confounder column seem reasonable, while the output for the case with confounders varies broadly from the first case and does not include my positive control.

Concrete results: Case1 without confounders cluster_x has q-value: 0.000147 Total nr of significant clusters (FDR <0.1): 35

Case2 with confounders cluster_x has q-value: 0.85 Total nr of significant clusters (FDR <0.1): 255

Here are my group-files. Case without confounders

76660.bam control 76661.bam control 76662.bam control AW.bam control BF.bam control UK.bam control 76663.bam mutation 76664.bam mutation 76665.bam mutation 18-5085.bam mutation 18-5148.bam mutation

and Case without confounders

76660.bam control SeqData1 76661.bam control SeqData1 76662.bam control SeqData1 AW.bam control SeqData2 BF.bam control SeqData2 UK.bam control SeqData2 76663.bam mutation SeqData1 76664.bam mutation SeqData1 76665.bam mutation SeqData1 18-5085.bam mutation SeqData2 18-5148.bam mutation SeqData2

result in completely different outcomes.

P.S. Also splitting the run based on the confounders supports Case1:

SeqData1: cluster_x has q-value: 0.071 Total nr of significant clusters (FDR < 0.1): 67

SeqData2: cluster_x has q-value: 0.0434 Total nr of significant clusters (FDR < 0.1): 465

Could you maybe elaborate, how the confounders are included and affect the calculation of Leafcutter's endresults?

— Reply to this email directly, view it on GitHub https://github.com/davidaknowles/leafcutter/issues/208, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGWTCP6ZVF46ELTB6OBBXLVAH4KVANCNFSM5Q4G2X7Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: @.***>