KeyError: 'A10' after munging of file 1 is complete

ukuvainik commented 6 years ago

Hi

When running MTAG between two traits, I complete munging the first GWAS, but after that I get the A10 error below. It does not depend on the order of traits entered. What could I do here? Thanks

Uku

<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> <> <> MTAG: Multi-trait Analysis of GWAS <> Version: 1.0.8 <> (C) 2017 Omeed Maghzian, Raymond Walters, and Patrick Turley <> Harvard University Department of Economics / Broad Institute of MIT and Harvard <> GNU General Public License v3 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> <> Note: It is recommended to run your own QC on the input before using this program. <> Software-related correspondence: maghzian@nber.org <> All other correspondence: paturley@broadinstitute.org <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

Calling ./mtag.py \ --se-name se \ --bpos-name pos \ --stream-stdout \ --n-name n_complete_samples \ --a2-name alt \ --n-min 0.0 \ --a1-name ref \ --snp-name rsid \ --eaf-name minor_AF \ --sumstats /dagher/dagherX/uvainik/gwas_base/cbmi_2015_felix/EGG_BMI_HapMap_DISCOVERY_mtagbmi.txt,/dagher/dagherX/uvainik/gwas_base/bmi_2018_ukb/bmi_rs_hiconf.tsv \ --beta-name beta \ --cores 12 \ --out /dagher/dagherX/uvainik/mtag_res/mtag_cbmi_bmi2018ukb.1NS

Beginning MTAG analysis... MTAG will use the provided BETA/SE columns for analyses Read in Trait 1 summary statistics (2499691 SNPs) from /dagher/dagherX/uvainik/gwas_base/cbmi_2015_felix/EGG_BMI_HapMap_DISCOVERY_mtagbmi.txt ... <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging Trait 1 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Interpreting column names as follows: rsid: Variant ID (e.g., rs number) n_complete_samples: Sample size pval: p-Value beta: Directional summary statistic as specified by --signed-sumstats. alt: Allele 2, interpreted as non-ref allele for signed sumstat. ref: Allele 1, interpreted as ref allele for signed sumstat. se: Standard errors of BETA coefficients

Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. Read 2499691 SNPs from --sumstats file. Removed 0 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with SE <0 or NaN values. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 2499691 SNPs remain. Removed 0 SNPs with duplicated rs numbers (2499691 SNPs remain). Removed 0 SNPs with N < 0.0 (2499691 SNPs remain). Median value of SIGNED_SUMSTAT was 0.0, which seems sensible. Dropping snps with null values

Metadata: Mean chi^2 = 1.133 Lambda GC = 1.104 Max chi^2 = 96.585 525 Genome-wide significant SNPs (some may have been removed by filtering).

Conversion finished at Fri Oct 5 13:37:25 2018 Total time elapsed: 10.66s <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging of Trait 1 complete. SNPs remaining: 2499691 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

'A10' Traceback (most recent call last): File "./mtag.py", line 1526, in mtag(args) File "./mtag.py", line 1298, in mtag DATA_U, DATA, args = load_and_merge_data(args) File "./mtag.py", line 271, in load_and_merge_data GWAS_d[p][col] = GWAS_d[p][col].str.upper() File "/export02/data/uku/anaconda2/envs/mtag/lib/python2.7/site-packages/pandas/core/frame.py", line 1964, in getitem return self._getitem_column(key) File "/export02/data/uku/anaconda2/envs/mtag/lib/python2.7/site-packages/pandas/core/frame.py", line 1971, in _getitem_column return self._get_item_cache(key) File "/export02/data/uku/anaconda2/envs/mtag/lib/python2.7/site-packages/pandas/core/generic.py", line 1645, in _get_item_cache values = self._data.get(item) File "/export02/data/uku/anaconda2/envs/mtag/lib/python2.7/site-packages/pandas/core/internals.py", line 3590, in get loc = self.items.get_loc(item) File "/export02/data/uku/anaconda2/envs/mtag/lib/python2.7/site-packages/pandas/core/indexes/base.py", line 2444, in get_loc return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1210, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1218, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'A10' Analysis terminated from error at Fri Oct 5 13:37:28 2018 Total time elapsed: 25.46s

huilisabrina commented 6 years ago

Hi @ukuvainik ,

I wonder if you're using the latest version of the software. If not, could you try re-pulling the repository? Please let me know if you the problem remains.

Thanks, Hui

ukuvainik commented 6 years ago

git pull says I was up to date already

ukuvainik commented 6 years ago

note that one of the traits does not have MAF information. however, the crash does not depend on, which trait I munge first.

http://egg-consortium.org/childhood-bmi.html

huilisabrina commented 6 years ago

@ukuvainik Thanks for that information. It's useful and indeed allele frequency is required for MTAG. Could you try adding that to the input sumstats using a reference panel? I just did some testing and couldn't replicate the error you got.

ukuvainik commented 6 years ago

could you point me to a tutorial or an example, I have never added MAF-s froma reference panel

thanks

huilisabrina commented 6 years ago

@ukuvainik For adding MAF column, you could use the HapMap3 or HRC samples. I think any software would do.

ukuvainik commented 6 years ago

@huilisabrina ok, they seem to be listed here. ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ . but it takes some time to download and merge

would you say that the MAF-s of UK biobank would work?

huilisabrina commented 6 years ago

@ukuvainik I think so if your GWAS sumstats are from a European sample. It's probably not the best question to ask here @paturley , but I'm happy to help if you have other questions on running MTAG.

Thanks, Hui

ukuvainik commented 6 years ago

Hi while i inserted UKB MAFs, it did not solve the problem. What solved part of the problem was renaming columns to the convention preferred by mtag, as listed in Tutorial. This suggests, that there is an issue with x-name flags. Note that even when i had a "se" and "beta" column in my file, i still had to flag beta and se manually: --se-name se, for instance. However, I then got an error further downstream, which was solved by manually calculating the z column.

So the takeaway seems to be that currently MTAG requires certain column names,as listed in the wiki, and the "z" column

huilisabrina commented 6 years ago

Hi @ukuvainik ,

Thanks again for these feedback, and sorry for the late reply. There are several issues so please see my responses below:

For the column names, instead of renaming the columns in the input data, did you try specifying the column names of your input directly in command line (when you call .mtag.py?) For example:

python /[path]/mtag.py  \
    --sumstats SS_1.txt,SS_2.txt \
    --n_min 0.0 \
        --snp_name rsID \
        --beta_name BETA \
        --se_name SE \
    --out ./test_colname  &

I have added the default values for --beta_name and --se_name now. Thank you for flagging that. You can check out the default column names via mtag.py -h.

At the moment, args.beta_name flag is used not only for identifying the correct column, but also for MTAG to prioritize using the beta/se columns, when Z statistic is also present in the input. In other words, if you don't specify this flag, even if the beta and se columns are in the input data, mtag will still look for the Z column. I will try to add another flag that does this job separately in the next iteration. Sorry for the inconvenience. I'll try to come up with a more user-friendly way for handling this and update the wiki accordingly.
If both BETA and SE are present in your input, (and you specified both the args.beta_name and args.se_name flags), the Z column should not be required. MTAG will calculate the z column automatically.

Please let me know if you still have questions. I will try my best to respond timely!

Thanks, Hui

huilisabrina commented 6 years ago

Hi @ukuvainik ,

Just to follow up on this thread, I have added a separate flag for calling MTAG to use the BETA and SE columns (as opposed to Z), --use_beta_se. Also updated is the wiki page that explains input format here. Hope this helps!

Best, Hui

ukuvainik commented 6 years ago

Thanks for the follow up! I will let you know if I encounter any issues, once I fire up MTAG again.

On Thu, Oct 18, 2018 at 4:51 PM huilisabrina notifications@github.com wrote:

Hi @ukuvainik https://github.com/ukuvainik ,

Just to follow up on this thread, I have added a separate flag for calling MTAG to use the BETA and SE columns (as opposed to Z), --use_beta_se. Also updated is the wiki page that explains input format here https://github.com/omeed-maghzian/mtag/wiki/Tutorial-1:-The-Basics#sample-gwas-results-and-data-format. Hope this helps!

Best, Hui

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/omeed-maghzian/mtag/issues/44#issuecomment-431156197, or mute the thread https://github.com/notifications/unsubscribe-auth/AGTtdzAyoTIjkBharsEhCfqE1yxu2zTGks5umOm2gaJpZM4XKpbJ .

JonJala / mtag

KeyError: 'A10' after munging of file 1 is complete #44