AlexTISYoung / snipar

Imputation of parental genotypes, inference of sibling IBD segments, family based GWAS, and polygenic score analyses.
MIT License
24 stars 5 forks source link

ibd.py genetic map errors #27

Closed AnnabelPerry closed 1 year ago

AnnabelPerry commented 1 year ago

Hello! Thanks for making this software - it's incredibly exciting! I'm troubleshooting some errors ibd.py which relate to the genetic map and was wondering if you could help. Here is a full description:

I have snipar successfully installed (to my Jupyter environment on my institution's computing cluster, /home/anp9168/jupytervenv/bin/activate) and have successfully run the example script. I tried running on UK Biobank data using the below script and inputs:

source /home/anp9168/jupytervenv/bin/activate

unset PYTHONPATH

ibd.py --bgen chr@ --king FirstDegreeKINGOutput.kin0 --agesex FirstDegreeAgeSex.txt --out IBD_Chr@ --threads 4 --ld_out

--bgen chr@ specifies phased chromosomes generated using EAGLE and converted from EAGLE to bgen format using QCTOOL

FirstDegreeKINGOutput.kin0 is derived from a KING output file. This file was generated by a collaborator and only had values for the "ID1", "ID2", "HetHet", "IBS0", "Kinship", and "InfType" columns, so I checked the KING formatting rules and inserted the other expected columns ("FID", "N_SNP","Z0","Phi","HetConc","HomIBS0","IBD1Seg","IBD2Seg", and "PropIBD"), filling them with NAs. Furthermore, I restricted the file to only first degree relatives (Kinship >= 0.177) to save space.

FirstDegreeAgeSex.txt has the age and sex of all individuals in the FirstDegreeKINGOutput.kin0 file

When I ran the code as described above, I got the following error:

Traceback (most recent call last):
  File "/home/anp9168/jupytervenv/bin/ibd.py", line 181, in <module>
    main(args)
  File "/home/anp9168/jupytervenv/bin/ibd.py", line 178, in main
    min_maf=min_maf, max_missing=max_missing, max_error=max_error)
  File "/home/anp9168/jupytervenv/lib/python3.7/site-packages/snipar/ibd.py", line 381, in infer_ibd_chr
    ld = compute_ld_scores(np.array(gts.gts[ld_sample,:], dtype=np.float_), gts.map, max_dist=1)
  File "/home/anp9168/jupytervenv/lib/python3.7/site-packages/numpy/ma/core.py", line 3224, in __getitem__
    dout = self.data[indx]
IndexError: index 46267 is out of bounds for axis 0 with size 45688

I peeked at the source code and it seemed that this error relates to the genetic map, so I tried specifying my own using the --map flag. I know my input data was mapped to hg19, so I downloaded genetic_map_hg19.txt.gz from here, unzipped it, and re-formatted it using the below Python code...

import numpy as np
import pandas as pd
mapfile = "genetic_map_hg19.txt"
map_df = pd.read_csv(mapfile, sep = " ")
# Remove first column
map_df = map_df.drop('chr', axis = 1)
# Change header names to format expected by snipar
map_df = map_df.rename(columns={"position": "pposition", "COMBINED_rate(cM/Mb)": "rrate", "Genetic_Map(cM)":"gposition"})
# Sort file by genetic position
map_df = map_df.sort_values("gposition")
# Output file
map_df.to_csv("genetic_map_hg19_SHAPEIT.txt", sep = " ", index = False)

...I then ran snipar with the --map flag routed to genetic_map_hg19_SHAPEIT.txt and got the following error:

Traceback (most recent call last):
  File "/home/anp9168/jupytervenv/bin/ibd.py", line 181, in 
    main(args)
  File "/home/anp9168/jupytervenv/bin/ibd.py", line 178, in main
    min_maf=min_maf, max_missing=max_missing, max_error=max_error)
  File "/home/anp9168/jupytervenv/lib/python3.7/site-packages/snipar/ibd.py", line 375, in infer_ibd_chr
    gts.map = get_map_positions(mapfile, gts)
  File "/home/anp9168/jupytervenv/lib/python3.7/site-packages/snipar/map.py", line 65, in get_map_positions
    raise(ValueError('Only '+str(round(100*prop_in_map))+'% of SNPs have genetic positions in '+mapfile+'. Need at least '+str(round(100*min_map_prop))+'%'))
ValueError: Only 39% of SNPs have genetic positions in /n/groups/reich/anp9168/software/RAFFI/hg19_genetic_maps/genetic_map_hg19_SHAPEIT.txt. Need at least 50%

I'm not sure what is wrong with my genetic map, or why the default genetic map was not working in the first place, and was wondering if you have any pointers?

AlexTISYoung commented 1 year ago

Hi Annabel, thanks for your detailed comment. I don't think this is an issue with the genetic map. I think this is a bug in the way I am selecting a random sample of size 1000 to compute LD scores. Let me get back to you when I've investigated this.

AlexTISYoung commented 1 year ago

I've pushed a fix to the master branch. You should be able to install this branch by cloning the git repository and using 'pip install .' in the downloaded snipar directory. (I would first uninstall your existing version using 'pip uninstall snipar'). Let me know if this fixes it!

AnnabelPerry commented 1 year ago

Thanks for your rapid response! I'll try this and get back to you

AnnabelPerry commented 1 year ago

I do not appear to be encountering the same issue anymore, thank you so much!