Cloufield / gwaslab

A Python package for handling and visualizing GWAS summary statistics. https://cloufield.github.io/gwaslab/
GNU General Public License v3.0
118 stars 22 forks source link

mysumstats.to_finemapping not working #97

Closed halkelchen closed 2 weeks ago

halkelchen commented 3 weeks ago

Hi, I am trying to do the LD matrix calculation and keep getting an error 'Sumstats' object has no attribute 'to_finemapping' when I try to use the example code. I have listed the code below and have previously installed the reference without issue in another cell. Would you be able to help me with this?

Calculate LD

mysumstats.to_finemapping(vcf = gl.get_path("ensembl_hg38_gtf"))

Additionally, I was wondering if there is a way to calculate a Z score without the beta or SE values since my summary statistics don't have either! I appreciate your help with these issues!

Cloufield commented 3 weeks ago

Hi, to_finemapping() has been renamed to calculate_ld_matrix(), which should be more straightforward. Please be noted that this function is still a beta version funtion and has not been widely tested yet. I haven't updated documents and tutorials. And you need to install PLINK1.9 and PLINK2 in your environmental path beforehand.

To calculated Z scores, you need a signed summary statistic (beta, OR ...) and P values or MLOG10P. I am wondering what statistic you have in your sumstats?

halkelchen commented 3 weeks ago

Thanks for updating me on the change to calculate_ld_matrix! I have these values (EAF, MAF, CHISQ, P, AND MLOG10P) in my GWAS summary stats and I'm struggling to see how I would be able to calculate z scores with what the researchers have provided!

Cloufield commented 3 weeks ago

Based on these values (EAF, MAF, CHISQ, P, AND MLOG10P), it is insufficient to determine the sign of Z. Please check the raw sumstats to see if there are any signed values. (When loading, gwaslab may omit some columns due to unrecognized headers.)

halkelchen commented 2 weeks ago

Thanks for the help, I ended up having to switch datasets because there were not any signed values. I am trying to use the calculate_ld_matrix and have installed both plink 1.9 and 2. I specified the path to plink 2 and check permissions to see if it was accessible on my computer and everything seemed fine. Although when I try : import plink plink2_path= "/Users/halk/Downloads/plink2-4"

Calculate ld matrix

mysumstats.calculate_ld_matrix(vcf = gl.get_path("ucsc_genome_hg19"))

I am still getting an error that says : CalledProcessError Traceback (most recent call last) Cell In[27], line 4 2 plink2_path= "/Users/halk/Downloads/plink2-4" 3 #Calculate ld matrix ----> 4 mysumstats.calculate_ld_matrix(vcf = gl.get_path("ucsc_genome_hg19"))

File ~/anaconda3/lib/python3.10/site-packages/gwaslab/g_Sumstats.py:785, in Sumstats.calculate_ld_matrix(self, kwargs) 784 def calculate_ld_matrix(self,kwargs): --> 785 self.to_finemapping_file_path, self.to_finemapping_file, self.plink_log = tofinemapping(self.data,study = self.meta["gwaslab"]["study_name"],**kwargs)

File ~/anaconda3/lib/python3.10/site-packages/gwaslab/util_ex_calculate_ldmatrix.py:80, in tofinemapping(sumstats, study, bfile, vcf, loci, out, windowsizekb, n_cores, mode, exclude_hla, getlead_args, memory, overwrite, log, suffixes, verbose, kwargs) 77 locus_sumstats = _extract_variants_in_locus(sumstats, windowsizekb, locus = (row["CHR"],row["POS"])) 79 #process reference file ---> 80 bfile_prefix, plink_log, ref_bim, filetype = _process_plink_input_files( chrlist=[row["CHR"]], 81 bfile=bfile, 82 vcf=vcf, 83 plink_log=plink_log, 84 n_cores=n_cores, 85 log=log, 86 load_bim=True, 87 overwrite=overwrite,kwargs) 89 ## check available snps with reference file 90 matched_sumstats = _align_sumstats_with_bim(row=row, 91 locus_sumstats=locus_sumstats, 92 ref_bim=ref_bim[0], 93 log=log,suffixes=suffixes)

File ~/anaconda3/lib/python3.10/site-packages/gwaslab/util_ex_process_ref.py:59, in _process_plink_input_files(chrlist, bfile, pfile, vcf, bgen, sample, n_cores, plink_log, log, overwrite, bgen_mode, convert, memory, load_bim) 51 ref_file_prefix, ref_bims = _process_pfile(chrlist=chrlist, 52 ref_file_prefix=ref_file_prefix, 53 ref_bims=ref_bims, 54 is_wild_card=is_wild_card, 55 log=log, 56 load_bim=load_bim) 58 elif filetype == "vcf": ---> 59 ref_file_prefix, plink_log, ref_bims = _process_vcf(ref_file_prefix=ref_file_prefix, 60 chrlist=chrlist, 61 ref_bims=ref_bims, 62 is_wild_card=is_wild_card, 63 log=log, 64 plink_log=plink_log, 65 n_cores=n_cores, 66 convert=convert, 67 memory=memory, 68 overwrite=overwrite, 69 load_bim=load_bim) 70 filetype = convert 71 elif filetype == "bgen":

File ~/anaconda3/lib/python3.10/site-packages/gwaslab/util_ex_process_ref.py:206, in _process_vcf(ref_file_prefix, chrlist, ref_bims, is_wild_card, log, plink_log, n_cores, convert, memory, overwrite, load_bim) 203 log.write(" -Processing VCF : {}...".format(ref_file_prefix)) 205 #check plink version --> 206 log = _checking_plink_version(v=2,log=log) 208 # file path prefix to return 209 if is_wild_card==True:

File ~/anaconda3/lib/python3.10/site-packages/gwaslab/g_version.py:28, in _checking_plink_version(v, log, verbose) 26 elif v==2: 27 which_plink_script = "plink2 --version" ---> 28 output = subprocess.check_output(which_plink_script, stderr=subprocess.STDOUT, shell=True,text=True) 29 log.write(" -PLINK version: {}".format(output.strip())) 30 return log

File ~/anaconda3/lib/python3.10/subprocess.py:421, in check_output(timeout, *popenargs, *kwargs) 418 empty = b'' 419 kwargs['input'] = empty --> 421 return run(popenargs, stdout=PIPE, timeout=timeout, check=True, 422 **kwargs).stdout

File ~/anaconda3/lib/python3.10/subprocess.py:526, in run(input, capture_output, timeout, check, *popenargs, **kwargs) 524 retcode = process.poll() 525 if check and retcode: --> 526 raise CalledProcessError(retcode, process.args, 527 output=stdout, stderr=stderr) 528 return CompletedProcess(process.args, retcode, stdout, stderr)

CalledProcessError: Command 'plink2 --version' returned non-zero exit status 127.

Any suggestions on how to fix this? I know it is still in the beta version but hoping to use this because I was having difficulty using plink by itself to get an ld matrix

Cloufield commented 2 weeks ago

hi, thanks for the feedback. This reason is that plink2 --version was hard-coded and not using the plink you provided. I will fix it in next version. For now, a simple solution was to add plink/plink2 to your environmental path using export. (be sure to rename it to plink/plink2 so that plink2 --version can be run directly from your terminal)