fastlmm / FaST-LMM

Python version of Factored Spectrally Transformed Linear Mixed Models
https://fastlmm.github.io/
Apache License 2.0
47 stars 11 forks source link

Epistatic interaction coefficient and error #16

Closed edinburgh-biomedical-ai closed 3 months ago

edinburgh-biomedical-ai commented 3 years ago

I would appreciate your advice on the following issue.

I am using FaST-LMM to estimate epistatic interactions using the GitHub code's own demo SNPs. However, unlike the table presented in https://nbviewer.jupyter.org/github/fastlmm/FaST-LMM/blob/master/doc/ipynb/FaST-LMM.ipynb which states the p-value based on the likelihood ratio test, I would like to output the weight of the interaction coefficient, together with its variance, similar to the case of a single SNP effect size.

I would expect the optimisation step to be almost identical, as the interaction coefficient would only correspond to an extra column in the design matrix. Could you please tell me how to output the coefficient of epistasis and its error?

Many thanks,

Ava

CarlKCarlK commented 3 years ago

Ava,

Thanks for your note and thanks looking at FaST-LMM.

I'm consulting with other folks on the team. While I do that, can you let me know the approximate size if your data?

of individuals, #SNPs (variants), any covariant, etc. Also, is the main data in PLINK Bed format or some other format?

Yours, Carl

Carl Kadie, Ph.D. FaST-LMM & PySnpTools Teamhttps://fastlmm.github.io/ (Microsoft Research, retired) https://www.linkedin.com/in/carlk/

Join the FaST-LMM user discussion and announcement list via @.***?subject=Subscribe> (or use web sign uphttps://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.python.org%2Fmailman3%2Flists%2Ffastlmm-user.python.org&data=02%7C01%7C%7C13a5c33d7cd84cad5cdf08d7bba56e20%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637184191498409587&sdata=2CQWjQEwOpQol2rQ1eoyVTgY8WvInV8UH31Wtl68FzY%3D&reserved=0)

From: edinburgh-biomedical-ai @.> Sent: Saturday, April 17, 2021 3:29 PM To: fastlmm/FaST-LMM @.> Cc: Subscribed @.***> Subject: [fastlmm/FaST-LMM] Epistatic interaction coefficient and error (#16)

I would appreciate your advice on the following issue.

I am using FaST-LMM to estimate epistatic interactions using the GitHub code's own demo SNPs. However, unlike the table presented in https://nbviewer.jupyter.org/github/fastlmm/FaST-LMM/blob/master/doc/ipynb/FaST-LMM.ipynbhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnbviewer.jupyter.org%2Fgithub%2Ffastlmm%2FFaST-LMM%2Fblob%2Fmaster%2Fdoc%2Fipynb%2FFaST-LMM.ipynb&data=04%7C01%7C%7Cc2a9009517564808df1808d901f02bf9%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637542953300235151%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=9Azgd2M1sv0NcZhGjI5LGlm5oQVXNUZZZFuQ5Jr5iLc%3D&reserved=0 which states the p-value based on the likelihood ratio test, I would like to output the weight of the interaction coefficient, together with its variance, similar to the case of a single SNP effect size.

I would expect the optimisation step to be almost identical, as the interaction coefficient would only correspond to an extra column in the design matrix. Could you please tell me how to output the coefficient of epistasis and its error?

Many thanks,

Ava

- You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ffastlmm%2FFaST-LMM%2Fissues%2F16&data=04%7C01%7C%7Cc2a9009517564808df1808d901f02bf9%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637542953300235151%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=wI29Wv1mQst8%2BONPQpjUGq9PX074kE8hENVEkYKUGvM%3D&reserved=0, or unsubscribehttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABR65P47MG6VCVMX44X6JBLTJIDSBANCNFSM43DQXAEA&data=04%7C01%7C%7Cc2a9009517564808df1808d901f02bf9%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637542953300245109%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=wodigAbmxDW%2Bc4DsBjSN3EKBNegg8lUe1feCw3HOIdc%3D&reserved=0.

edinburgh-biomedical-ai commented 3 years ago

Thank you Carl.

The data information is as follows: 1) Number of individuals: UK Biobank size, so approximately half a million individuals 2) Number of SNPs (variants): A vector of 1000 SNPs interacting with a vector of 10 SNPS (order of magnitude) 3) Number of covariates: 20 PCA components (or less), age, gender, batch, UKBB centre and suchlike (<30) 4) The main data is in bgen format.

Thanks for you help!

Ava

CarlKCarlK commented 3 years ago

Ava,

If you like, you can email me at carlk (at) msn.com.

I think we can do this with a different FaST-LMM function, single_snp, which reports the stddev of beta (sqrt of variance). The idea would be to test a pseudo-SNP representing a pair of SNPs. Its value would be the standardized value of each SNPs in the pair multiplied together.

Some questions:

Yours, Carl

From: edinburgh-biomedical-ai @.> Sent: Monday, April 19, 2021 2:07 AM To: fastlmm/FaST-LMM @.> Cc: Carl Kadie @.>; Comment @.> Subject: Re: [fastlmm/FaST-LMM] Epistatic interaction coefficient and error (#16)

Thank you Carl.

The data information is as follows: of individuals: UK Biobank size, so approximately half a million individuals SNPs (variants): A vector of 1000 SNPs interacting with a vector of 10 SNPS (order of magnitude) covariates: 20 PCA components (or less), age, gender, batch, UKBB centre and suchlike (<30) The main data is in bgen format.

Thanks for you help!

Ava

- You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ffastlmm%2FFaST-LMM%2Fissues%2F16%23issuecomment-822305290&data=04%7C01%7C%7C274ad7d363eb4260146c08d903126e85%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637544199960648218%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=8NulXXCR4DMO013kwLABikwSG5EGh2ZdvmE3%2BaaJIgA%3D&reserved=0, or unsubscribehttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABR65P7IYR3PVGGR5ADKFZTTJPXBVANCNFSM43DQXAEA&data=04%7C01%7C%7C274ad7d363eb4260146c08d903126e85%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637544199960658178%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=UW5IFjCnwA7%2FTEW9vcNTsdw2QAa8NXSwxVOfn6%2BACzc%3D&reserved=0.

CarlKCarlK commented 3 years ago

Ava,

By the way, if this is too many questions to answer easily, just answer the easy ones.

(I'm generally interested in merging the epistasis code with the single-snp code, so this project is of interest to me.)

From: Carl KADIE Sent: Monday, April 19, 2021 4:36 PM To: fastlmm/FaST-LMM @.>; fastlmm/FaST-LMM @.> Cc: Comment @.***> Subject: RE: [fastlmm/FaST-LMM] Epistatic interaction coefficient and error (#16)

Ava,

If you like, you can email me at carlk (at) msn.com.

I think we can do this with a different FaST-LMM function, single_snp, which reports the stddev of beta (sqrt of variance). The idea would be to test a pseudo-SNP representing a pair of SNPs. Its value would be the standardized value of each SNPs in the pair multiplied together.

Some questions:

Yours, Carl

From: edinburgh-biomedical-ai @.**@.>> Sent: Monday, April 19, 2021 2:07 AM To: fastlmm/FaST-LMM @.**@.>> Cc: Carl Kadie @.**@.>>; Comment @.**@.>> Subject: Re: [fastlmm/FaST-LMM] Epistatic interaction coefficient and error (#16)

Thank you Carl.

The data information is as follows: of individuals: UK Biobank size, so approximately half a million individuals SNPs (variants): A vector of 1000 SNPs interacting with a vector of 10 SNPS (order of magnitude) covariates: 20 PCA components (or less), age, gender, batch, UKBB centre and suchlike (<30) The main data is in bgen format.

Thanks for you help!

Ava

- You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ffastlmm%2FFaST-LMM%2Fissues%2F16%23issuecomment-822305290&data=04%7C01%7C%7C274ad7d363eb4260146c08d903126e85%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637544199960648218%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=8NulXXCR4DMO013kwLABikwSG5EGh2ZdvmE3%2BaaJIgA%3D&reserved=0, or unsubscribehttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABR65P7IYR3PVGGR5ADKFZTTJPXBVANCNFSM43DQXAEA&data=04%7C01%7C%7C274ad7d363eb4260146c08d903126e85%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637544199960658178%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=UW5IFjCnwA7%2FTEW9vcNTsdw2QAa8NXSwxVOfn6%2BACzc%3D&reserved=0.

edinburgh-biomedical-ai commented 3 years ago

Hi Carl,

Thank you for the response. Here are the answers to your questions: Q: BGEN gives SNP value distribution. Is it OK to use the expected value of the SNP? A: We will discretise the BGEN inputs, so we would like to pass a pandas/numpy array of SNPs with values 0, 1, 2 as input to the code. Dimensions: 2000 SNPs x 500K individuals

Q: Is the similarity matrix based on the 1000 SNPs or on something else? (If something else, what and how large and what format?) Q: I assume we want to leave out the one or two chromosomes in each pair from the similarity matrix? For example, if testing the pair SNP0,SNP1 where SNP0 is from Chrom2 and SNP1 is from Chrom5, then we want to use a similarity matrix without SNPs in Chrom2 and 5. A: We compute the GRM matrix separately (using our in-house software). We would like to pass the GRM as input to the code. It will be a matrix that is approximately 500K x 500K. If you think the diagonalisation is too expensive for FaSTLMM, we can also perform that using our in-house software and input the diagonalised matrix (together with eigenvalues/eigenvectors) instead.

Q: Is there any overlap between the 1000 SNPs and the 10 (or so) SNPs? A: No.

Q: Is there just one phenotype? A: No, but I assume that's a matter of a 'for loop'.

Q: What is the format of the Phenotype and covariate files? A: CSV, but we can convert if necessary.

In short, we only need to be able to use the FaSTLMM optimiser to get at the interaction coefficient and its variance (together with effect sizes as done in the single SNP case). As input, we separately have the GRM and/or its diagonalisation if necessary.

Thank you,

Ava