FelixHeinrich / MIDESP

Mutual information based detection of epistatic SNP pairs
GNU General Public License v3.0
3 stars 2 forks source link

Adjusting for covariates #3

Open raqueldias opened 1 month ago

raqueldias commented 1 month ago

I was wondering if MIDESP would be able to support covariate adjustment, since covariates like age and sex could completely skew the results. I ran MIDESP for UK Biobank and the QQplots show that the p-values are all inflated, which could be due to bias caused by age, sex, or ethnicity. If MIDESP can support MI calculations that adjust for covariates, then the inflated p-value problem would be solved. It seems like conditional mutual information is what we would need in this case: https://stackoverflow.com/questions/55402338/finding-conditional-mutual-information-from-3-discrete-variable Thanks for considering this request, and thank you so much for the most recent update!

FelixHeinrich commented 1 month ago

Hi, including covariates should be relatively simple as long as they are categorical variables. I can work on that when I have some time. Continuous covariates will be more difficult because I'm not as familiar with the corresponding math. Just to make sure that I understand the problem correctly: We want to remove association that can also be explained by the covariate.

Phenotype Sex SNP genotype
1 M AA
1 M AA
1 M AA
2 F CC
2 F CC
2 F CC

If we include the covariate Sex the SNP would have no association to the phenotype. If we don't consider the covariate the SNP would have perfect association to the phenotype.

Is that correct?

raqueldias commented 1 month ago

Thank you so much for your quick response. Yes, that is exactly what I meant.

raqueldias commented 1 month ago

If I may help on the continuous side of the adjustment: the infodynamics package has the methods for implementing the k-nearest neighbors based conditional mutual information. I made a draft code example to help (untested yet):

import infodynamics.measures.continuous.kraskov.ConditionalMutualInfoCalculatorMultiVariateKraskov;
import infodynamics.utils.MatrixUtils;

public class CMICalculation {
    public static void main(String[] args) {
        // Example data: replace with your actual data
        double[][] data = {
            {1.0, 2.0, 2.0},
            {1.0, 1..0, 0.0},
            {17.0, 38.0, 19.0}
        };

        // Convert data to the required format
        double[][] x = MatrixUtils.selectColumns(data, new int[]{0});
        double[][] y = MatrixUtils.selectColumns(data, new int[]{1});
        double[][] z = MatrixUtils.selectColumns(data, new int[]{2});

        // Initialize the CMI calculator
        ConditionalMutualInfoCalculatorMultiVariateKraskov cmiCalc = new ConditionalMutualInfoCalculatorMultiVariateKraskov();
        cmiCalc.setProperty("k", "3"); // Number of nearest neighbors
        cmiCalc.initialise(1, 1, 1); // Dimensions of x, y, z

        // Set the observations
        cmiCalc.setObservations(x, y, z);

        // Compute CMI
        double cmiValue = cmiCalc.computeAverageLocalOfObservations();
        System.out.println("Conditional Mutual Information: " + cmiValue);
    }
}