getian107 / PRScsx

Cross-population polygenic prediction
MIT License
65 stars 20 forks source link

issues with SNP reference files #4

Closed jeffspence closed 3 years ago

jeffspence commented 3 years ago

I've run into a couple of issues with the SNP reference files.

First, the README says that the reference file with the SNP names should be either snpinfo_mult_1kg_hm3 or snpinfo_mult_ukbb_hm3 and those are indeed the names of the files when downloaded according to the "Getting Started" section. But it seems like the program is actually looking for a file called snpinfo_mult_hm3:

FileNotFoundError: [Errno 2] No such file or directory: '/path/to/ldpanels/snpinfo_mult_hm3'

If I change the 1kg filename, then the program runs as expected.

If I change the name for the ukbb file, then the script throws an error about the columns having the wrong types:

##### process chromosome 1 #####
... parse reference file: /path/to/ldpanels/ukbb/snpinfo_mult_hm3 ...
Traceback (most recent call last):
  File "/path/to/prscsx/PRScsx.py", line 190, in <module>
    main()
  File "/path/to/prscsx/PRScsx-master/PRScsx.py", line 167, in main
    ref_dict = parse_genet.parse_ref(param_dict['ref_dir'] + '/snpinfo_mult_hm3', int(chrom))
  File "/path/to/prscsx/PRScsx-master/parse_genet.py", line 31, in parse_ref
    ref_dict['FLP_EUR'].append(int(ll[8]))
ValueError: invalid literal for int() with base 10: '0.232000'

In either case, it would be good to PRScsx to look for the correctly named files (or update the README). It would also be nice to be able to get it to work with UKBB LD panel.

Thanks!

getian107 commented 3 years ago

Hi- did you pull the latest version of the software before running? I updated the filename in the script when uploading the UKBB reference panels.

jeffspence commented 3 years ago

Hi @getian107 thanks for the quick response! Yes that's exactly what it is! I hadn't pulled between a couple of weeks ago and now. Thanks again for the quick response!