Open raqueldias opened 1 month ago
Hi, including covariates should be relatively simple as long as they are categorical variables. I can work on that when I have some time. Continuous covariates will be more difficult because I'm not as familiar with the corresponding math. Just to make sure that I understand the problem correctly: We want to remove association that can also be explained by the covariate.
Phenotype | Sex | SNP genotype |
---|---|---|
1 | M | AA |
1 | M | AA |
1 | M | AA |
2 | F | CC |
2 | F | CC |
2 | F | CC |
If we include the covariate Sex the SNP would have no association to the phenotype. If we don't consider the covariate the SNP would have perfect association to the phenotype.
Is that correct?
Thank you so much for your quick response. Yes, that is exactly what I meant.
If I may help on the continuous side of the adjustment: the infodynamics package has the methods for implementing the k-nearest neighbors based conditional mutual information. I made a draft code example to help (untested yet):
import infodynamics.measures.continuous.kraskov.ConditionalMutualInfoCalculatorMultiVariateKraskov;
import infodynamics.utils.MatrixUtils;
public class CMICalculation {
public static void main(String[] args) {
// Example data: replace with your actual data
double[][] data = {
{1.0, 2.0, 2.0},
{1.0, 1..0, 0.0},
{17.0, 38.0, 19.0}
};
// Convert data to the required format
double[][] x = MatrixUtils.selectColumns(data, new int[]{0});
double[][] y = MatrixUtils.selectColumns(data, new int[]{1});
double[][] z = MatrixUtils.selectColumns(data, new int[]{2});
// Initialize the CMI calculator
ConditionalMutualInfoCalculatorMultiVariateKraskov cmiCalc = new ConditionalMutualInfoCalculatorMultiVariateKraskov();
cmiCalc.setProperty("k", "3"); // Number of nearest neighbors
cmiCalc.initialise(1, 1, 1); // Dimensions of x, y, z
// Set the observations
cmiCalc.setObservations(x, y, z);
// Compute CMI
double cmiValue = cmiCalc.computeAverageLocalOfObservations();
System.out.println("Conditional Mutual Information: " + cmiValue);
}
}
I was wondering if MIDESP would be able to support covariate adjustment, since covariates like age and sex could completely skew the results. I ran MIDESP for UK Biobank and the QQplots show that the p-values are all inflated, which could be due to bias caused by age, sex, or ethnicity. If MIDESP can support MI calculations that adjust for covariates, then the inflated p-value problem would be solved. It seems like conditional mutual information is what we would need in this case: https://stackoverflow.com/questions/55402338/finding-conditional-mutual-information-from-3-discrete-variable Thanks for considering this request, and thank you so much for the most recent update!