JonJala / mtag

Python command line tool for Multi-Trait Analysis of GWAS (MTAG)
GNU General Public License v3.0
169 stars 54 forks source link

Meta-analisis for 4 different populations, different LD #104

Open ronahuel opened 4 years ago

ronahuel commented 4 years ago

Hello ! First I woudl like to thanks the developers for their open source program. My doubt is about the LD. I've noted that the software uses the LD score to estimate sigma. Well we have at least 4 populations, each population probably have different LD score. So my doubt is, there is any chance to perform the metaanalisis for the same trait but using different LD panels ?? Or any suggestion to address this issue? Thanks !

paturley commented 4 years ago

Hello,

I agree that an important assumption of MTAG is that the LD score used is sufficiently similar to the true population LD score for the populations corresponding to each set of summary statistics. If that is not the case, the MTAG model will be misspecified. We are working on an extension to MTAG now that allows for the LD scores across populations to differ, but it is not yet published.

On Wed, Aug 5, 2020 at 11:29 AM ronahuel notifications@github.com wrote:

Hello ! First I woudl like to thanks the developers for their open source program. My doubt is about the LD. I've noted that the software uses the LD score to estimate sigma. Well we have at least 4 populations, each population probably have different LD score. So my doubt is, there is any chance to perform the metaanalisis for the same trait but using different LD panels ?? Or any suggestion to address this issue? Thanks !

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/104, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5M6Y5UTYOZSHA2B6ADR7F3GDANCNFSM4PVTYOBQ .

ronahuel commented 4 years ago

Thanks for your early answer !

ronahuel commented 4 years ago

Hello, I agree that an important assumption of MTAG is that the LD score used is sufficiently similar to the true population LD score for the populations corresponding to each set of summary statistics. If that is not the case, the MTAG model will be misspecified. We are working on an extension to MTAG now that allows for the LD scores across populations to differ, but it is not yet published. On Wed, Aug 5, 2020 at 11:29 AM ronahuel @.***> wrote: Hello ! First I woudl like to thanks the developers for their open source program. My doubt is about the LD. I've noted that the software uses the LD score to estimate sigma. Well we have at least 4 populations, each population probably have different LD score. So my doubt is, there is any chance to perform the metaanalisis for the same trait but using different LD panels ?? Or any suggestion to address this issue? Thanks ! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#104>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5M6Y5UTYOZSHA2B6ADR7F3GDANCNFSM4PVTYOBQ .

Another doubt that comes to me if it is correct to use LD scores based on physical position, in the case that we don't have a linkage map (with centimorgans). Thanks again !

paturley commented 4 years ago

Hi,

I don't know that I totally understand your concern, but it sounds like a general critique of LD score regression. If you have concerns that these issues you are bringing up are sufficiently large that it leads to important biases in LD score regression estimates, that could be really important. My sense based on the simulation and empirical literature is that LD score regression is reasonably robust to the violations that have been studied, but I don't know if your concern has been carefully considered.

On Wed, Aug 5, 2020 at 1:14 PM ronahuel notifications@github.com wrote:

Hello, I agree that an important assumption of MTAG is that the LD score used is sufficiently similar to the true population LD score for the populations corresponding to each set of summary statistics. If that is not the case, the MTAG model will be misspecified. We are working on an extension to MTAG now that allows for the LD scores across populations to differ, but it is not yet published. … <#m1642922628641457816> On Wed, Aug 5, 2020 at 11:29 AM ronahuel @.***> wrote: Hello ! First I woudl like to thanks the developers for their open source program. My doubt is about the LD. I've noted that the software uses the LD score to estimate sigma. Well we have at least 4 populations, each population probably have different LD score. So my doubt is, there is any chance to perform the metaanalisis for the same trait but using different LD panels ?? Or any suggestion to address this issue? Thanks ! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#104 https://github.com/JonJala/mtag/issues/104>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5M6Y5UTYOZSHA2B6ADR7F3GDANCNFSM4PVTYOBQ .

Another doubt that comes to me if it is correct to use LD scores based on physical position, in the case that we don't have a linkage map (with centimorgans). Thanks again !

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/104#issuecomment-669318997, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5O3MALNB2D4BRX5OBDR7GHQFANCNFSM4PVTYOBQ .

ronahuel commented 4 years ago

Thanks again for your help. This is not a general critique, I'm just a student. Well the main issue that i've noted is that in the default LD map, there's a column with centimorgans values (cm, im wrong?). So the thing is that we actually don't have a recombination map (to estimate the cm measure). We were able to develop a LD score trough using ldsc software by the next command:

system("./ldsc.py --bfile /home/rodrigomarin/Aquagen/ldsc/SS17_SW/1 --out /home/rodrigomarin/Aquagen/ldsc/ldsc_ss17_sw/1 --ld-wind-kb 1 --l2 ")

Nonetheles this is not being recognized by MTAG, at the moment that MTAG tries to estimate sigma matrix:

Dropped 0 SNPs due to strand ambiguity, 56146 SNPs remain in intersection after merging trait2 ... Merge of GWAS summary statistics complete. Number of SNPs: 56146 Using 56146 SNPs to estimate Omega (0 SNPs excluded due to strand ambiguity) Estimating sigma.. Could not open /ld_ref_panel/ldsc_ss17_sw/1.l2.ldscore[./gz/bz2] Traceback (most recent call last): File "/home/rodrigomarin/Aquagen/MTAG/mtag/mtag.py", line 1567, in mtag(args)

The main differences between the default and my LD ref panel, is that the eur_w_ld_chr have 2 columns more. One with MAF and another with CM. Here comes the first question. Is absolutely neccesary to posses a CM map?

I've modified the sumstat.py and alleleinfo.py files and replaced 22 for 29 chromosomes. I hope my concern is a little clearer now. Your help is very much appretiated. Best regards. Rodrigo.

paturley commented 4 years ago

My guess is that a CM map is not necessary. I think we likely inherited that code from LD score regression since there is no clear reason to me why that would be necessary for MTAG. If you are able to produce valid LD scores using LDSC, those should be fine to pass straight into MTAG.

Not sure if that helps. Sorry if I'm not understanding.

On Wed, Aug 5, 2020 at 4:50 PM ronahuel notifications@github.com wrote:

Thanks again for your help. This is not a general critique, I'm just a student. Well the main issue that i've noted is that in the default LD map, there's a column with centimorgans values (cm, im wrong?). So the thing is that we actually don't have a recombination map (to estimate the cm measure). We were able to develop a LD score trough using ldsc software by the next command:

system("./ldsc.py --bfile /home/rodrigomarin/Aquagen/ldsc/SS17_SW/1 --out /home/rodrigomarin/Aquagen/ldsc/ldsc_ss17_sw/1 --ld-wind-kb 1 --l2 ")

Nonetheles this is not being recognized by MTAG, at the moment that MTAG tries to estimate sigma matrix:

Dropped 0 SNPs due to strand ambiguity, 56146 SNPs remain in intersection after merging trait2 ... Merge of GWAS summary statistics complete. Number of SNPs: 56146 Using 56146 SNPs to estimate Omega (0 SNPs excluded due to strand ambiguity) Estimating sigma.. Could not open /ld_ref_panel/ldsc_ss17_sw/1.l2.ldscore[./gz/bz2] Traceback (most recent call last): File "/home/rodrigomarin/Aquagen/MTAG/mtag/mtag.py", line 1567, in mtag(args)

The main differences between the default and my LD ref panel, is that the eur_w_ld_chr have 2 columns more. One with MAF and another with CM. Here comes the first question. Is absolutely neccesary to posses a CM map?

I've modified the sumstat.py and alleleinfo.py files and replaced 22 for 29 chromosomes. I hope my concern is a little clearer now. Your help is very much appretiated. Best regards. Rodrigo.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/104#issuecomment-669500999, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5OTTHUHNJMF6MZKBYTR7HA23ANCNFSM4PVTYOBQ .