JonJala / mtag

Python command line tool for Multi-Trait Analysis of GWAS (MTAG)
GNU General Public License v3.0
169 stars 54 forks source link

ERROR converting summary statistics #86

Open kys21207 opened 4 years ago

kys21207 commented 4 years ago

Hi there,

I got the ValueError like below when I ran the following;

python mtag-master/mtag.py --sumstats data1.txt,data2.txt --p_name pval_raw --out mtag-master/OA_chr --use_beta_se --beta_name beta --se_name se --n_min 0.0 --stream_stdout

We use a correction to convert the linear regression coefficients to the log-odds scale.
We multiply beta and se by 1/(p*(1-p)) to do the scaling.​

I am not sure why this error has occurred. Could you give us any suggestions? Error message <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> <> <> MTAG: Multi-trait Analysis of GWAS <> Version: 1.0.8 <> (C) 2017 Omeed Maghzian, Raymond Walters, and Patrick Turley <> Harvard University Department of Economics / Broad Institute of MIT and Harvard <> GNU General Public License v3 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> <> Note: It is recommended to run your own QC on the input before using this program. <> Software-related correspondence: maghzian@nber.org <> All other correspondence: paturley@broadinstitute.org <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

Calling ./mtag.py \ --p-name pval_raw \ --stream-stdout \ --n-min 0.0 \ --use-beta-se \ --sumstats data1.txt,data2.txt \ --out mtag-master/OA_chr

Beginning MTAG analysis... MTAG will use the provided BETA/SE columns for analyses. Read in Trait 1 summary statistics (67652017 SNPs) from data1.txt ... <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging Trait 1 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Interpreting column names as follows: snpid: Variant ID (e.g., rs number) n: Sample size a1: a1, interpreted as ref allele for signed sumstat. pval: p-Value beta: Directional summary statistic as specified by --signed-sumstats. a2: a2, interpreted as non-ref allele for signed sumstat. se: Standard errors of BETA coefficients

Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. Read 67652017 SNPs from --sumstats file. Removed 0 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with SE <0 or NaN values. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 67652017 SNPs remain. Removed 234254 SNPs with duplicated rs numbers (67417763 SNPs remain). Removed 0 SNPs with N < 0.0 (67417763 SNPs remain).

ERROR converting summary statistics:

Traceback (most recent call last): File "/home/cdsw/mtag-master/mtag_munge.py", line 881, in munge_sumstats check_median(dat.SIGNED_SUMSTAT, signed_sumstat_null, 0.1, sign_cname)) File "/home/cdsw/mtag-master/mtag_munge.py", line 525, in check_median raise ValueError(msg.format(F=name, M=expected_median, V=round(m, 2))) ValueError: WARNING: median value of SIGNED_SUMSTAT is -0.11 (should be close to 0.0). This column may be mislabeled.

Conversion finished at Mon Feb 3 18:02:53 2020 Total time elapsed: 13.0m:51.02s WARNING: median value of SIGNED_SUMSTAT is -0.11 (should be close to 0.0). This column may be mislabeled. Traceback (most recent call last): File "mtag-master/mtag.py", line 1567, in mtag(args) File "mtag-master/mtag.py", line 1336, in mtag DATA_U, DATA, args = load_and_merge_data(args) File "mtag-master/mtag.py", line 269, in load_and_merge_data GWAS_d[p], sumstats_format[p] = _perform_munge(args, GWAS_d[p], gwas_dat_gen, p) File "mtag-master/mtag.py", line 162, in _perform_munge munged_results = munge_sumstats.munge_sumstats(argnames, write_out=False, new_log=False) File "/home/cdsw/mtag-master/mtag_munge.py", line 881, in munge_sumstats check_median(dat.SIGNED_SUMSTAT, signed_sumstat_null, 0.1, sign_cname)) File "/home/cdsw/mtag-master/mtag_munge.py", line 525, in check_median raise ValueError(msg.format(F=name, M=expected_median, V=round(m, 2))) ValueError: WARNING: median value of SIGNED_SUMSTAT is -0.11 (should be close to 0.0). This column may be mislabeled. Analysis terminated from error at Mon Feb 3 18:02:53 2020 Total time elapsed: 16.0m:23.07s

paturley commented 4 years ago

Hello,

MTAG has a QC check looking at the median betahat estimate. Since the sign of a GWAS coefficient is more or less randomly assigned, the median effect size should be very close to zero generally. It looks like your was 0.11, which is substantially larger than expected. It looks like you used the beta-se option. Do you get the same error when you use the default Z-N option?

Best, Patrick

On Mon, Feb 3, 2020 at 1:42 PM Kijoung Song notifications@github.com wrote:

Hi there,

I got the ValueError like below when I ran the following;

python mtag-master/mtag.py --sumstats data1.txt,data2.txt --p_name pval_raw --out mtag-master/OA_chr --use_beta_se --beta_name beta --se_name se --n_min 0.0 --stream_stdout

I am not sure why this error has occurred. Could you give us any suggestions? Error message

<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> <> <> MTAG: Multi-trait Analysis of GWAS <> Version: 1.0.8 <> (C) 2017 Omeed Maghzian, Raymond Walters, and Patrick Turley <> Harvard University Department of Economics / Broad Institute of MIT and Harvard <> GNU General Public License v3

<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> <> Note: It is recommended to run your own QC on the input before using this program. <> Software-related correspondence: maghzian@nber.org <> All other correspondence: paturley@broadinstitute.org

<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

Calling ./mtag.py --p-name pval_raw --stream-stdout --n-min 0.0 --use-beta-se --sumstats data1.txt,data2.txt --out mtag-master/OA_chr

Beginning MTAG analysis... MTAG will use the provided BETA/SE columns for analyses. Read in Trait 1 summary statistics (67652017 SNPs) from data1.txt ...

<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging Trait 1 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><

<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Interpreting column names as follows: snpid: Variant ID (e.g., rs number) n: Sample size a1: a1, interpreted as ref allele for signed sumstat. pval: p-Value beta: Directional summary statistic as specified by --signed-sumstats. a2: a2, interpreted as non-ref allele for signed sumstat. se: Standard errors of BETA coefficients

Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. Read 67652017 SNPs from --sumstats file. Removed 0 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with SE <0 or NaN values. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 67652017 SNPs remain. Removed 234254 SNPs with duplicated rs numbers (67417763 SNPs remain). Removed 0 SNPs with N < 0.0 (67417763 SNPs remain).

ERROR converting summary statistics:

Traceback (most recent call last): File "/home/cdsw/mtag-master/mtag_munge.py", line 881, in munge_sumstats check_median(dat.SIGNED_SUMSTAT, signed_sumstat_null, 0.1, sign_cname)) File "/home/cdsw/mtag-master/mtag_munge.py", line 525, in check_median raise ValueError(msg.format(F=name, M=expected_median, V=round(m, 2))) ValueError: WARNING: median value of SIGNED_SUMSTAT is -0.11 (should be close to 0.0). This column may be mislabeled.

Conversion finished at Mon Feb 3 18:02:53 2020 Total time elapsed: 13.0m:51.02s WARNING: median value of SIGNED_SUMSTAT is -0.11 (should be close to 0.0). This column may be mislabeled. Traceback (most recent call last): File "mtag-master/mtag.py", line 1567, in mtag(args) File "mtag-master/mtag.py", line 1336, in mtag DATA_U, DATA, args = load_and_merge_data(args) File "mtag-master/mtag.py", line 269, in load_and_merge_data GWAS_d[p], sumstats_format[p] = _perform_munge(args, GWAS_d[p], gwas_dat_gen, p) File "mtag-master/mtag.py", line 162, in _perform_munge munged_results = munge_sumstats.munge_sumstats(argnames, write_out=False, new_log=False) File "/home/cdsw/mtag-master/mtag_munge.py", line 881, in munge_sumstats check_median(dat.SIGNED_SUMSTAT, signed_sumstat_null, 0.1, sign_cname)) File "/home/cdsw/mtag-master/mtag_munge.py", line 525, in check_median raise ValueError(msg.format(F=name, M=expected_median, V=round(m, 2))) ValueError: WARNING: median value of SIGNED_SUMSTAT is -0.11 (should be close to 0.0). This column may be mislabeled. Analysis terminated from error at Mon Feb 3 18:02:53 2020 Total time elapsed: 16.0m:23.07s

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/omeed-maghzian/mtag/issues/86?email_source=notifications&email_token=AFBUB5LQZDDBCTS4ZIUZLRTRBBQSPA5CNFSM4KPKCSDKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IKVVOJA, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5KQQHZPUY5ITIDGO4LRBBQSPANCNFSM4KPKCSDA .

kys21207 commented 4 years ago

Yes, I got the similar median even if I used Z-N option.

paturley commented 4 years ago

It really is unusual for the median summary statistic to be so large. Have you checked your data to make sure that the distribution of your betas looks right? Is there a reason that you might expect them to be large and positive for your phenotype?

On Wed, Feb 5, 2020 at 1:31 PM Kijoung Song notifications@github.com wrote:

Yes, I got the similar median even if I used Z-N option.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/omeed-maghzian/mtag/issues/86?email_source=notifications&email_token=AFBUB5PH532QWUBQBWHKFPDRBMAXFA5CNFSM4KPKCSDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEK4PRKA#issuecomment-582547624, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5NAFRVP66R2IHYN74TRBMAXFANCNFSM4KPKCSDA .

joeyinjie commented 4 years ago

I have exactly the same issue. I also used Z-N option, and my log says "ValueError: WARNING: median value of SIGNED_SUMSTAT is 0.11 (should be close to 0.0). This column may be mislabeled." I double checked everything, nothing seems wrong in my data.

paturley commented 4 years ago

When you calculate the median of your z-scores, do you get a number near 0.11?

On Thu, Jun 25, 2020 at 6:18 PM joeyinjie notifications@github.com wrote:

I have exactly the same issue. I also used Z-N option, and my log says "ValueError: WARNING: median value of SIGNED_SUMSTAT is 0.11 (should be close to 0.0). This column may be mislabeled." I double checked everything, nothing seems wrong in my data.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/86#issuecomment-649846136, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5LLHE2KAETHSG7UZNDRYPEM7ANCNFSM4KPKCSDA .

joeyinjie commented 4 years ago

When you calculate the median of your z-scores, do you get a number near 0.11? On Thu, Jun 25, 2020 at 6:18 PM joeyinjie @.***> wrote: I have exactly the same issue. I also used Z-N option, and my log says "ValueError: WARNING: median value of SIGNED_SUMSTAT is 0.11 (should be close to 0.0). This column may be mislabeled." I double checked everything, nothing seems wrong in my data. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#86 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5LLHE2KAETHSG7UZNDRYPEM7ANCNFSM4KPKCSDA .

The median of the z-scores is 0.1032.

paturley commented 4 years ago

Hmm. So it's pretty unlikely that the median z-statistic over a million SNPs would be as large as 0.1 unless (1) there is something pretty unique about the phenotype (like you have a high powered GWAS for a phenotype with one or two very large signals) (2) you've selected the reference allele systematically so the effect sizes are for some reason more likely to be positive, or (3) there is something wrong with your summary statistics. What are the min, max, mean, and variance of your z-stats?

On Fri, Jun 26, 2020 at 12:55 PM joeyinjie notifications@github.com wrote:

When you calculate the median of your z-scores, do you get a number near 0.11? … <#m8969965426866111010> On Thu, Jun 25, 2020 at 6:18 PM joeyinjie @.***> wrote: I have exactly the same issue. I also used Z-N option, and my log says "ValueError: WARNING: median value of SIGNED_SUMSTAT is 0.11 (should be close to 0.0). This column may be mislabeled." I double checked everything, nothing seems wrong in my data. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#86 (comment) https://github.com/JonJala/mtag/issues/86#issuecomment-649846136>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5LLHE2KAETHSG7UZNDRYPEM7ANCNFSM4KPKCSDA .

The median of the z-scores is 0.1032.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/86#issuecomment-650285144, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5LS643HAVYJG5FEXF3RYTHGNANCNFSM4KPKCSDA .

xiaofeiyu1992 commented 3 years ago

Hi all, Got the same error as below: ValueError: WARNING: median value of SIGNED_SUMSTAT is -0.21 (should be close to 0.0). This column may be mislabeled.

I am more curious if this is the case, could we still apply MTAG analysis if the median is not close to zero?

Cheers, Xiaofei

paturley commented 3 years ago

Hi Xiaofei,

There is nothing intrinsically wrong with the median to have a large value. We just included it because, except is rare circumstances, it's likely to be a sign that something is funny with the data. If you have some reason to believe that the median should be very far from zero (for example, you've oriented all your reference alleles so that every SNP has a positive or negative estimated effect), the MTAG should work fine in theory. I think you can bypass this error with the force option. I'd strongly recommend you identify why your data are this way though before you charge ahead ignoring this error.

Let me know if you have any other questions.

Patrick

On Mon, Jun 14, 2021 at 1:37 PM Xiaofei Yu @.***> wrote:

Hi all, Got the same error as below: ValueError: WARNING: median value of SIGNED_SUMSTAT is -0.21 (should be close to 0.0). This column may be mislabeled.

I am more curious if this is the case, could we still apply MTAG analysis if the median is not close to zero?

Cheers, Xiaofei

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/86#issuecomment-860866159, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5MRETUDNLDTPBTLIXDTSY47HANCNFSM4KPKCSDA .

xiaofeiyu1992 commented 3 years ago

Capture

Hi Patrick,

After checking the distribution of beta value, I think it is close to normal distribution. However, with the average is about -0.21 and the median around -0.2, I did not see it far away from zero. Although theoretically, it should be zero, I think in real data it is difficult to be exactly zero. Do you agree with me?

Best, Xiaofei

paturley commented 3 years ago

Yeah that looks fine to me. It's also possible that the units of your phenotype are very large, which would inflate the expected magnitude of the median beta. When we set that threshold, I think we were anticipating that most people would standardize their phenotypes. An effect size of 30 for a standardized phenotype would be huge! But there is no reason that you would have to standardize. I think you are probably just fine.

On Tue, Jun 15, 2021 at 8:59 AM Xiaofei Yu @.***> wrote:

[image: Capture] https://user-images.githubusercontent.com/79228924/122055830-51e57b80-cde9-11eb-8604-4ba823c42d3b.PNG

Hi Patrick,

After checking the distribution of beta value, I think it is close to normal distribution. However, with the average is about -0.21 and the median around -0.2, I did not see it far away from zero. Although theoretically, it should be zero, I think in real data it is difficult to be exactly zero. Do you agree with me?

Best, Xiaofei

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/86#issuecomment-861476413, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5JVVPFAGBLTPX6TW3LTS5FDDANCNFSM4KPKCSDA .

dianacornejo commented 2 years ago

@paturley I'm getting the same error! I wonder how did you solve this...I'm trying --force but it still errors out! Thanks

paturley commented 2 years ago

Hi Diana,

Do you have a log file for your analysis? Or alternatively, do you know the standard deviation of the phenotype you are analyzing?

Patrick

On Fri, Feb 18, 2022 at 11:42 AM diana.cornejo @.***> wrote:

@paturley https://github.com/paturley I'm getting the same error! I wonder how did you solve this...I'm trying --force but it still errors out! Thanks

— Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/86#issuecomment-1045079056, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5ONVPPKKLKFLHVFSX3U32OL5ANCNFSM4KPKCSDA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

hrazifard commented 1 year ago

Hello all. I have this same issue. I did not standardize my phenotype when I ran GWAS. I also tried the --force option, but that didn't help

Here is the log from MTAG:

Calling ./mtag.py \ --force \ --sumstats phenotype1.txt,phenotype2.txt,phenotype3.txt,phenotype4.txt

2022/12/16/10:04:51 PM Beginning MTAG analysis... 2022/12/16/10:04:51 PM MTAG will use the Z column for analyses. 2022/12/16/10:04:52 PM Read in Trait 1 summary statistics (166603 SNPs) from phenotype1.txt ... 2022/12/16/10:04:52 PM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2022/12/16/10:04:52 PM Munging Trait 1 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< 2022/12/16/10:04:52 PM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2022/12/16/10:04:52 PM Interpreting column names as follows: 2022/12/16/10:04:52 PM snpid: Variant ID (e.g., rs number) n: Sample size a1: a1, interpreted as ref allele for signed sumstat. pval: p-Value a2: a2, interpreted as non-ref allele for signed sumstat. z: Directional summary statistic as specified by --signed-sumstats. se: Standard errors of BETA coefficients

2022/12/16/10:04:52 PM Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. 2022/12/16/10:04:52 PM Read 166603 SNPs from --sumstats file. Removed 11 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with SE <0 or NaN values. Removed 0 SNPs with out-of-bounds p-values. Removed 689 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 165903 SNPs remain. 2022/12/16/10:04:52 PM Removed 0 SNPs with duplicated rs numbers (165903 SNPs remain). 2022/12/16/10:04:52 PM Removed 0 SNPs with N < 546.666666667 (165903 SNPs remain). 2022/12/16/10:04:53 PM ERROR converting summary statistics:

2022/12/16/10:04:53 PM Traceback (most recent call last): File "/home/ubuntu/software/mtag/mtag_munge.py", line 882, in munge_sumstats check_median(dat.SIGNED_SUMSTAT, signed_sumstat_null, args.median_z_cutoff, sign_cname)) File "/home/ubuntu/software/mtag/mtag_munge.py", line 525, in check_median raise ValueError(msg.format(F=name, M=expected_median, V=round(m, 2))) ValueError: WARNING: median value of SIGNED_SUMSTAT is -0.14 (should be close to 0.0). This column may be mislabeled.

2022/12/16/10:04:53 PM Conversion finished at Fri Dec 16 22:04:53 2022 2022/12/16/10:04:53 PM Total time elapsed: 1.01s 2022/12/16/10:04:53 PM WARNING: median value of SIGNED_SUMSTAT is -0.14 (should be close to 0.0). This column may be mislabeled. Traceback (most recent call last): File "/home/ubuntu/software/mtag/mtag.py", line 1577, in mtag(args) File "/home/ubuntu/software/mtag/mtag.py", line 1343, in mtag DATA_U, DATA, args = load_and_merge_data(args) File "/home/ubuntu/software/mtag/mtag.py", line 273, in load_and_merge_data GWAS_d[p], sumstats_format[p] = _perform_munge(args, GWAS_d[p], gwas_dat_gen, p) File "/home/ubuntu/software/mtag/mtag.py", line 166, in _perform_munge munged_results = munge_sumstats.munge_sumstats(argnames, write_out=False, new_log=False) File "/home/ubuntu/software/mtag/mtag_munge.py", line 882, in munge_sumstats check_median(dat.SIGNED_SUMSTAT, signed_sumstat_null, args.median_z_cutoff, sign_cname)) File "/home/ubuntu/software/mtag/mtag_munge.py", line 525, in check_median raise ValueError(msg.format(F=name, M=expected_median, V=round(m, 2))) ValueError: WARNING: median value of SIGNED_SUMSTAT is -0.14 (should be close to 0.0). This column may be mislabeled. 2022/12/16/10:04:53 PM Analysis terminated from error at Fri Dec 16 22:04:53 2022 2022/12/16/10:04:53 PM Total time elapsed: 1.29s

paturley commented 1 year ago

Have you plotted a histogram of your z-scores? This error is meant to identify irregularities in the distribution of the test statistics.

On Fri, Dec 16, 2022 at 5:08 PM hrazifard @.***> wrote:

Hello all. I have this same issue. I did not standardize my phenotype when I ran GWAS. I also tried the --force option, but that didn't help

Here is the log from MTAG:

Calling ./mtag.py --force --sumstats murmur_grade.txt,LVIDdN.txt,LADN.txt,LA_Ao.txt

2022/12/16/10:04:51 PM Beginning MTAG analysis... 2022/12/16/10:04:51 PM MTAG will use the Z column for analyses. 2022/12/16/10:04:52 PM Read in Trait 1 summary statistics (166603 SNPs) from murmur_grade.txt ... 2022/12/16/10:04:52 PM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2022/12/16/10:04:52 PM Munging Trait 1 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< 2022/12/16/10:04:52 PM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2022/12/16/10:04:52 PM Interpreting column names as follows: 2022/12/16/10:04:52 PM snpid: Variant ID (e.g., rs number) n: Sample size a1: a1, interpreted as ref allele for signed sumstat. pval: p-Value a2: a2, interpreted as non-ref allele for signed sumstat. z: Directional summary statistic as specified by --signed-sumstats. se: Standard errors of BETA coefficients

2022/12/16/10:04:52 PM Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. 2022/12/16/10:04:52 PM Read 166603 SNPs from --sumstats file. Removed 11 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with SE <0 or NaN values. Removed 0 SNPs with out-of-bounds p-values. Removed 689 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 165903 SNPs remain. 2022/12/16/10:04:52 PM Removed 0 SNPs with duplicated rs numbers (165903 SNPs remain). 2022/12/16/10:04:52 PM Removed 0 SNPs with N < 546.666666667 (165903 SNPs remain). 2022/12/16/10:04:53 PM ERROR converting summary statistics:

2022/12/16/10:04:53 PM Traceback (most recent call last): File "/home/ubuntu/software/mtag/mtag_munge.py", line 882, in munge_sumstats check_median(dat.SIGNED_SUMSTAT, signed_sumstat_null, args.median_z_cutoff, sign_cname)) File "/home/ubuntu/software/mtag/mtag_munge.py", line 525, in check_median raise ValueError(msg.format(F=name, M=expected_median, V=round(m, 2))) ValueError: WARNING: median value of SIGNED_SUMSTAT is -0.14 (should be close to 0.0). This column may be mislabeled.

2022/12/16/10:04:53 PM Conversion finished at Fri Dec 16 22:04:53 2022 2022/12/16/10:04:53 PM Total time elapsed: 1.01s 2022/12/16/10:04:53 PM WARNING: median value of SIGNED_SUMSTAT is -0.14 (should be close to 0.0). This column may be mislabeled. Traceback (most recent call last): File "/home/ubuntu/software/mtag/mtag.py", line 1577, in mtag(args) File "/home/ubuntu/software/mtag/mtag.py", line 1343, in mtag DATA_U, DATA, args = load_and_merge_data(args) File "/home/ubuntu/software/mtag/mtag.py", line 273, in load_and_merge_data GWAS_d[p], sumstats_format[p] = _perform_munge(args, GWAS_d[p], gwas_dat_gen, p) File "/home/ubuntu/software/mtag/mtag.py", line 166, in _perform_munge munged_results = munge_sumstats.munge_sumstats(argnames, write_out=False, new_log=False) File "/home/ubuntu/software/mtag/mtag_munge.py", line 882, in munge_sumstats check_median(dat.SIGNED_SUMSTAT, signed_sumstat_null, args.median_z_cutoff, sign_cname)) File "/home/ubuntu/software/mtag/mtag_munge.py", line 525, in check_median raise ValueError(msg.format(F=name, M=expected_median, V=round(m, 2))) ValueError: WARNING: median value of SIGNED_SUMSTAT is -0.14 (should be close to 0.0). This column may be mislabeled. 2022/12/16/10:04:53 PM Analysis terminated from error at Fri Dec 16 22:04:53 2022 2022/12/16/10:04:53 PM Total time elapsed: 1.29s

— Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/86#issuecomment-1355714479, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5JO4GI7IUVCAZRXWYDWNTR5DANCNFSM4KPKCSDA . You are receiving this because you were mentioned.Message ID: @.***>

hrazifard commented 1 year ago

Thanks for the quick reply, Patrick!

Please see the screenshot of my z scores. The GWAS software had not reported Z scores, so I added them by dividing beta values by standard errors. Is that the correct way?

By the way, I am using mtag on 4 different phenotypes, and they all have similar distributions of the Z scores, as in the screenshot. Screen Shot 2022-12-16 at 5 47 09 PM

paturley commented 1 year ago

Curious. Yep. And that histogram looks a little left-skewed, but not so much that I would be worried. I see you only have ~100k SNPs in your sumstats. That seems a bit small. Is that what you would have expected?

Patrick

On Fri, Dec 16, 2022 at 5:50 PM hrazifard @.***> wrote:

[image: Screen Shot 2022-12-16 at 5 47 09 PM] https://user-images.githubusercontent.com/109365255/208201259-7f0b2d79-6c7f-4bd2-85a8-6fd3dd3ffb01.png

Thanks for the quick reply, Patrick!

Please see the screenshot of my z scores. The GWAS software had not reported Z scores, so I added them by dividing beta values by standard errors. Is that the correct way?

By the way, I am using mtag on 4 different phenotypes, and they all have similar distributions of the Z scores, as in the screenshot.

— Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/86#issuecomment-1355747443, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5NBULH36HIZUH6T5L3WNTW3LANCNFSM4KPKCSDA . You are receiving this because you were mentioned.Message ID: @.***>

hrazifard commented 1 year ago

Yes, the number of SNPs is around 160K. Do you have any ideas about resolving the error?

ChloePittman commented 1 year ago

Hi, I am having a similar issue with the error "ValueError: WARNING: median value of SIGNED_SUMSTAT is .... (should be close to 0.0). This column may be mislabeled.". I have tried using the --force option but the analysis is still terminated by this warning. The previous comments suggested the reasons behind the warning, and I believe the summary statistics are fine. More precisely, I have tried:

Do you agree that MTAG should run on these summary statistics? And is there a way to force the analysis?

Best, Chloé

JonJala commented 1 year ago

The force option has the description "Force MTAG estimation even though the mean chi2 is small.", so that won't really help there.

If you're hitting a stop at that particular error message, that appears to be in mtag_munge.py at lines 523-525. You could try the median_z_cutoff flag, as that appears to be what is passed in as a tolerance (or should be). It looks like you're right at the edge of the default of 0.1, so you could try either passing in a larger value with the flag or going and editing line 523 of the code to ignore the tolerance.

On Thu, Jan 12, 2023 at 9:07 AM ChloePittman @.***> wrote:

Hi, I am having a similar issue with the error "ValueError: WARNING: median value of SIGNED_SUMSTAT is .... (should be close to 0.0). This column may be mislabeled.". I have tried using the --force option but the analysis is still terminated by this warning. The previous comments suggested the reasons behind the warning, and I believe the summary statistics are fine. More precisely, I have tried:

Do you agree that MTAG should run on these summary statistics? And is there a way to force the analysis?

Best, Chloé

— Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/86#issuecomment-1380406953, or unsubscribe https://github.com/notifications/unsubscribe-auth/APIOF52P2SZVG5I6QQCXGKDWSAF37ANCNFSM4KPKCSDA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

nrahmioglu commented 1 year ago

Hi,

I'm using a publicly available GWAS summary dataset and getting the same error. The signed_sumstat is -0.72. Investigating the distribution of z values - see below. It seems like they have selected reference allele systematically so the effect sizes are always negative. The minimum z is -10.12, max z is 0, median z is -0.72, mean z is -0.86. How can I solve this issue? This is a powerful GWAS with many genome-wide significant associations. Not sure how to modify the summary stats to make this work? Any advice much appreciated. hist

paturley commented 1 year ago

Very funny. If you are confident that the sumstats are right but just oriented so the z-stat is negative, you could just reorient the sumstats such that they correspond to a random allele. You'd need to switch the reference allele, multiply the beta and z-stat by -1, and change the allele frequency to 1-maf for about half of the SNPs.

On Thu, Mar 16, 2023 at 6:42 AM Nilufer Rahmioglu @.***> wrote:

Hi,

I'm using a publicly available GWAS summary dataset and getting the same error. The signed_sumstat is -0.72. Investigating the distribution of z values - see below. It seems like they have selected reference allele systematically so the effect sizes are always negative. The minimum z is -10.12, max z is 0, median z is -0.72, mean z is -0.86. How can I solve this issue? This is a powerful GWAS with many genome-wide significant associations. Not sure how to modify the summary stats to make this work? Any advice much appreciated. [image: hist] https://user-images.githubusercontent.com/49775769/225634402-fa3683bf-f0da-4d72-9b22-9025a4b143ff.png

— Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/86#issuecomment-1471976723, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5PSDD3F7XO2BNLV4GLW4MKENANCNFSM4KPKCSDA . You are receiving this because you were mentioned.Message ID: @.***>

nrahmioglu commented 1 year ago

Indeed it is! But Everything else looked good. I've done the random flipping and now MTAG is happily working. Thank you.

aydanasg commented 6 months ago

Hi there!

I am also get the following error: "ValueError: WARNING: median value of SIGNED_SUMSTAT is 0.7 (should be close to 0.0). This column may be mislabeled."

I am using Alzheimer's GWAS by Jansen et al. (2019) (https://pubmed.ncbi.nlm.nih.gov/30617256/) which is well powered and Multiple Sclerosis GWAS summary stats (https://pubmed.ncbi.nlm.nih.gov/27386562/). I have also subsetted the summary statistics to SNPs which are found in Hapmap3. I have also tried using --median_z_cutoff and setting this to 0.7. Would this be correct to do?

Thank you in advance! Aydan

paturley commented 6 months ago

Hmm. Assuming that there is nothing funny with your summary statistics and that MTAG is reading the right columns, setting the median z cutoff to some high value like that shouldn't be a problem. 0.7 seems exceptionally high though for a value that should be pretty close to zero. If I were you, I'd make sure you are confident that MTAG is reading your data correctly or that you haven't pre-filtered the SNPs in some way that may cause problems (like just genome-wide significant SNPs or something). Ideally, you'd figure out why you have such a high median z-score before adjusting the cutoff.

On Thu, Feb 29, 2024 at 10:40 AM aydanasg @.***> wrote:

Hi there!

I am also get the following error: "ValueError: WARNING: median value of SIGNED_SUMSTAT is 0.7 (should be close to 0.0). This column may be mislabeled."

I am using Alzheimer's GWAS by Jansen et al. (2019) ( https://pubmed.ncbi.nlm.nih.gov/30617256/) which is well powered. I have also subsetted the summary statistics to SNPs which are found in Hapmap3. I have also tried using --median_z_cutoff and setting this to 0.7. Would this be correct to do?

Thank you in advance! Aydan

— Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/86#issuecomment-1971409104, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5KAVADWD54K25RISIDYV5FXLAVCNFSM4KPKCSDKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJXGE2DAOJRGA2A . You are receiving this because you were mentioned.Message ID: @.***>