JonJala / mtag

Python command line tool for Multi-Trait Analysis of GWAS (MTAG)
GNU General Public License v3.0
169 stars 54 forks source link

Effect directions #126

Open fletchkatie opened 3 years ago

fletchkatie commented 3 years ago

Hello - I'm just flagging an issue with the coding of the effect / non-effect alleles. In the Wiki, it states that A1 is considered the effect allele. However, whenever I read in data, MTAG uses A2 as the effect (non-ref) allele and A1 as reference, according to the log files. I have not specified A1 / A2 in my command file and all the input stats have matched alleles (ie there is no need for allele flipping).

So with the output files for our analysis, The correct way to read our MTAG results files is:

A1 = REFERENCE allele A2= EFFECT (ie NON-REFERENCE) allele

Wiki basic tutorial: "a1/a2: Alleles observed at the particular locus. a1 is considered to be the effect allele, which should also be reflected in the signs of the Z-scores. These columns are also passed to the ldsc routine. mtag checks and flips the a1 and a2 alleles so that they are identical across input files. Other column names may be passed via the --a1_name and --a2_name options."

Log file from analysis: Munging Trait 1 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< 2020/10/12/09:36:52 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2020/10/12/09:36:52 AM Interpreting column names as follows: 2020/10/12/09:36:52 AM snpid: Variant ID (e.g., rs number) n: Sample size a1: a1, interpreted as ref allele for signed sumstat. pval: p-Value beta: Directional summary statistic as specified by --signed-sumstats. a2: a2, interpreted as non-ref allele for signed sumstat. se: Standard errors of BETA coefficients

If the Wiki is wrong, it might be worth correcting as this has resulted in a lot of confusion with colleagues using these files for downstream analyses. Will be interested to hear what you think.

But otherwise, this is a great program so thanks for making it available - have used it to great effect with the UKBB data!

JonJala commented 3 years ago

Hi, would you mind including what your MTAG command line looked like? (it will make it easier to dig into the flow of what's going on if we've got the values for the flags)

On Mon, Mar 8, 2021 at 10:36 AM fletchkatie notifications@github.com wrote:

Hello - I'm just flagging an issue with the coding of the effect / non-effect alleles. In the Wiki, it states that A1 is considered the effect allele. However, whenever I read in data, MTAG uses A2 as the effect (non-ref) allele and A1 as reference, according to the log files. I have not specified A1 / A2 in my command file and all the input stats have matched alleles (ie there is no need for allele flipping).

So with the output files for our analysis, The correct way to read our MTAG results files is:

A1 = REFERENCE allele A2= EFFECT (ie NON-REFERENCE) allele

Wiki basic tutorial: "a1/a2: Alleles observed at the particular locus. a1 is considered to be the effect allele, which should also be reflected in the signs of the Z-scores. These columns are also passed to the ldsc routine. mtag checks and flips the a1 and a2 alleles so that they are identical across input files. Other column names may be passed via the --a1_name and --a2_name options."

Log file from analysis: Munging Trait 1 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< 2020/10/12/09:36:52 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2020/10/12/09:36:52 AM Interpreting column names as follows: 2020/10/12/09:36:52 AM snpid: Variant ID (e.g., rs number) n: Sample size a1: a1, interpreted as ref allele for signed sumstat. pval: p-Value beta: Directional summary statistic as specified by --signed-sumstats. a2: a2, interpreted as non-ref allele for signed sumstat. se: Standard errors of BETA coefficients

If the Wiki is wrong, it might be worth correcting as this has resulted in a lot of confusion with colleagues using these files for downstream analyses. Will be interested to hear what you think.

But otherwise, this is a great program so thanks for making it available - have used it to great effect with the UKBB data!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/126, or unsubscribe https://github.com/notifications/unsubscribe-auth/APIOF54CJSOURPUCYKJKUZTTCTVJNANCNFSM4YZTR2AQ .

paturley commented 3 years ago

Hello,

Looking at your question more closely, I think I actually understand what the problem is. We use the plink convention of referring to the reference allele as the effect allele (with both being a1). I think that in VCF files, they have reversed the notation so the alternate allele is the effect allele. But I agree that the lack of clarity here is confusing. We will update the Wiki and documentation to make it clear that a1 is supposed to refer to the effect allele.

Does that answer your question?

Thanks! Patrick

On Mon, Mar 8, 2021 at 12:40 PM Jonathan Jala notifications@github.com wrote:

Hi, would you mind including what your MTAG command line looked like? (it will make it easier to dig into the flow of what's going on if we've got the values for the flags)

On Mon, Mar 8, 2021 at 10:36 AM fletchkatie notifications@github.com wrote:

Hello - I'm just flagging an issue with the coding of the effect / non-effect alleles. In the Wiki, it states that A1 is considered the effect allele. However, whenever I read in data, MTAG uses A2 as the effect (non-ref) allele and A1 as reference, according to the log files. I have not specified A1 / A2 in my command file and all the input stats have matched alleles (ie there is no need for allele flipping).

So with the output files for our analysis, The correct way to read our MTAG results files is:

A1 = REFERENCE allele A2= EFFECT (ie NON-REFERENCE) allele

Wiki basic tutorial: "a1/a2: Alleles observed at the particular locus. a1 is considered to be the effect allele, which should also be reflected in the signs of the Z-scores. These columns are also passed to the ldsc routine. mtag checks and flips the a1 and a2 alleles so that they are identical across input files. Other column names may be passed via the --a1_name and --a2_name options."

Log file from analysis: Munging Trait 1 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< 2020/10/12/09:36:52 AM

<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2020/10/12/09:36:52 AM Interpreting column names as follows: 2020/10/12/09:36:52 AM snpid: Variant ID (e.g., rs number) n: Sample size a1: a1, interpreted as ref allele for signed sumstat. pval: p-Value beta: Directional summary statistic as specified by --signed-sumstats. a2: a2, interpreted as non-ref allele for signed sumstat. se: Standard errors of BETA coefficients

If the Wiki is wrong, it might be worth correcting as this has resulted in a lot of confusion with colleagues using these files for downstream analyses. Will be interested to hear what you think.

But otherwise, this is a great program so thanks for making it available

have used it to great effect with the UKBB data!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/126, or unsubscribe < https://github.com/notifications/unsubscribe-auth/APIOF54CJSOURPUCYKJKUZTTCTVJNANCNFSM4YZTR2AQ

.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/126#issuecomment-792939813, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5IGPS6UV4VJZYLLNVLTCUDXNANCNFSM4YZTR2AQ .

fletchkatie commented 3 years ago

Hello - thanks for looking into this and for your helpful and detailed responses.

I think I have solved the problem - it was to do with how I created the Input files from BOLT.

It would help if the non-ref / ref allele bit of the logs was a bit clearer re effect allele / non-effect allele though as this threw me!

Many thanks.

Best wishes, Katie

My command line is pasted below. I have looked at your other reply re PLINK naming conventions.

MY COMMAND LINE:

echo "#PBS -lselect=1:ncpus=48:mem=124gb

PBS -lwalltime=24:00:00

module load anaconda3/personal

source activate /rds/general/project/lms-ukbiobank-analysis/live/Katie/mtagenv1

python /rds/general/project/francis_ukbb/live/MTAG/software/mtag/mtag.py \ --sumstats $input_dir/AAmax,$input_dir/AAmin,$input_dir/AAdis,$input_dir/DAmax,$input_dir/DAmin,$input_dir/DAdis \ --out /rds/general/project/lms-ukbiobank-analysis/live/Katie/MTAG_X_results \ --incld_ambig_snps \ --use_beta_se \ --n_min 0.0 \ --stream_stdout" > /rds/general/project/lms-ukbiobank-analysis/live/Katie/MTAG_X_input_files/mtag_withX.cmd

qsub -o /rds/general/project/lms-ukbiobank-analysis/live/Katie/MTAG_X_input_files/mtag_withX.log -e /rds/general/project/lms-ukbiobank-analysis/live/Katie/MTAG_X_input_files/mtag_withX.err < /rds/general/project/lms-ukbiobank-analysis/live/Katie/MTAG_X_input_files/mtag_withX.cmd


From: Jonathan Jala notifications@github.com Sent: 08 March 2021 17:40 To: JonJala/mtag mtag@noreply.github.com Cc: fletchkatie fletchkatie@hotmail.com; Author author@noreply.github.com Subject: Re: [JonJala/mtag] Effect directions (#126)

Hi, would you mind including what your MTAG command line looked like? (it will make it easier to dig into the flow of what's going on if we've got the values for the flags)

On Mon, Mar 8, 2021 at 10:36 AM fletchkatie notifications@github.com wrote:

Hello - I'm just flagging an issue with the coding of the effect / non-effect alleles. In the Wiki, it states that A1 is considered the effect allele. However, whenever I read in data, MTAG uses A2 as the effect (non-ref) allele and A1 as reference, according to the log files. I have not specified A1 / A2 in my command file and all the input stats have matched alleles (ie there is no need for allele flipping).

So with the output files for our analysis, The correct way to read our MTAG results files is:

A1 = REFERENCE allele A2= EFFECT (ie NON-REFERENCE) allele

Wiki basic tutorial: "a1/a2: Alleles observed at the particular locus. a1 is considered to be the effect allele, which should also be reflected in the signs of the Z-scores. These columns are also passed to the ldsc routine. mtag checks and flips the a1 and a2 alleles so that they are identical across input files. Other column names may be passed via the --a1_name and --a2_name options."

Log file from analysis: Munging Trait 1 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< 2020/10/12/09:36:52 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2020/10/12/09:36:52 AM Interpreting column names as follows: 2020/10/12/09:36:52 AM snpid: Variant ID (e.g., rs number) n: Sample size a1: a1, interpreted as ref allele for signed sumstat. pval: p-Value beta: Directional summary statistic as specified by --signed-sumstats. a2: a2, interpreted as non-ref allele for signed sumstat. se: Standard errors of BETA coefficients

If the Wiki is wrong, it might be worth correcting as this has resulted in a lot of confusion with colleagues using these files for downstream analyses. Will be interested to hear what you think.

But otherwise, this is a great program so thanks for making it available - have used it to great effect with the UKBB data!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/126, or unsubscribe https://github.com/notifications/unsubscribe-auth/APIOF54CJSOURPUCYKJKUZTTCTVJNANCNFSM4YZTR2AQ .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FJonJala%2Fmtag%2Fissues%2F126%23issuecomment-792939813&data=04%7C01%7C%7Cb0ab3a14b55641e5510808d8e259372b%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637508220088386722%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=zgp29c7qkIoMAPXyUF9O5vrXEH%2Fw5GrpBywPSLwY6s0%3D&reserved=0, or unsubscribehttps://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FADNB2SMIJUXKRXSXK3AEZ7TTCUDXNANCNFSM4YZTR2AQ&data=04%7C01%7C%7Cb0ab3a14b55641e5510808d8e259372b%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637508220088386722%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=cBfuAWKacJnSLZq31IXZhjAd0y5ebY%2FQhIEmFo5voJI%3D&reserved=0.