bvilhjal / ldpred

MIT License
96 stars 57 forks source link

Recognizing X chromosome, "Unable to open object (object 'chrom_23' doesn't exist)" #66

Closed Ambrosinae closed 4 years ago

Ambrosinae commented 5 years ago

Hi,

I looked at the previous issue brought up about the final chromosome here: https://github.com/bvilhjal/ldpred/issues/17 and it seems that ldpred should be able to recognize X as a valid chromosome with the most up-to-date version. However, I am getting this error during the coordination step:

`"Unable to open object (object 'chrom_23' doesn't exist)"

Did not find chromosome in SS dataset.`

And later in the 'Summary of Coordination Step' section:

Num chromosomes used: 22

Both my plink files and SS file should have the chromosome listed as a capital X. I am using the most recent version of ldpred in the master branch.

Best, Josh

bvilhjal commented 5 years ago

I believe I've fixed this now. Thanks for reporting. Let me know if you have any problems with the fix. It should also be easy to extend to other chromosomes (Y, MT), but I'm worried about how the SNPs are coded for those, hence I didn't want to extend it for those now.

Best, Bjarni

Ambrosinae commented 5 years ago

Hi,

I still have this error.

"Unable to open object (object 'chrom_23' doesn't exist)"
Did not find chromosome 23 in SS dataset.
Continuing.
SNPs in LD Reference data:                                              12066678
Num chromosomes used:                                                         22
SNPs common across datasets:                                            11740250

EDIT: It seems that the output is different than before too, and it looks like this: (don't mind the ^H's, I'm viewing this on a Windows machine)

Coordinating datasets (Summary statistics and LD reference genotypes).
^H^H^H^H^H^H^H^H^H100.00%
^H^H^H^H^H^H^H^H^H100.00%
^H^H^H^H^H^H^H^H100.00%
^H^H^H^H^H^H^H^H100.00%
^H^H^H^H^H^H^H^H100.00%
^H^H^H^H^H^H^H^H100.00%
^H^H^H^H^H^H^H^H100.00%
^H^H^H^H^H^H^H^H100.00%
^H^H^H^H^H^H^H^H100.00%
^H^H^H^H^H^H^H^H100.00%
^H^H^H^H^H^H^H^H100.00%
^H^H^H^H^H^H^H^H100.00%
^H^H^H^H^H^H^H^H100.00%
^H^H^H^H^H^H^H^H100.00%
^H^H^H^H^H^H^H^H100.00%
^H^H^H^H^H^H^H^H100.00%
^H^H^H^H^H^H^H^H100.00%
^H^H^H^H^H^H^H^H100.00%
^H^H^H^H^H^H^H^H100.00%
^H^H^H^H^H^H^H^H100.00%
^H^H^H^H^H^H^H^H100.00%
^H^H^H^H^H^H^H^H100.00%
^H^H^H^H^H^H^H95.83%"Unable to open object (object 'chrom_23' doesn't exist)"
Did not find chromosome 23 in SS dataset.
Continuing.

While previously it looked like this:

Coordinating datasets (Summary statistics and LD reference genotypes).
Storing coordinated data to HDF5 file.
Storing coordinated data to HDF5 file.
Storing coordinated data to HDF5 file.
Storing coordinated data to HDF5 file.
Storing coordinated data to HDF5 file.
Storing coordinated data to HDF5 file.
Storing coordinated data to HDF5 file.
Storing coordinated data to HDF5 file.
Storing coordinated data to HDF5 file.
Storing coordinated data to HDF5 file.
Storing coordinated data to HDF5 file.
Storing coordinated data to HDF5 file.
Storing coordinated data to HDF5 file.
Storing coordinated data to HDF5 file.
Storing coordinated data to HDF5 file.
Storing coordinated data to HDF5 file.
Storing coordinated data to HDF5 file.
Storing coordinated data to HDF5 file.
Storing coordinated data to HDF5 file.
Storing coordinated data to HDF5 file.
Storing coordinated data to HDF5 file.
Storing coordinated data to HDF5 file.
"Unable to open object (object 'chrom_23' doesn't exist)"
Did not find chromosome in SS dataset.
Continuing.

In addition, looking at your commit, it seems that the previous code should have worked as well since the chromosome is simply listed as "X" in the chromosome column. I believe there shouldn't be any issues on my end, but I will check again.

Also, I am using a lot of SNPs, but I'm going to cut that down with the HapMap3 option after seeing the other issues people brought up about memory.

EDIT2: I tried running gibbs but it looks like either not having chrom_23 breaks it or there's something wrong with my dataset.

With the hapmap3 option for coord:

SNPs common across datasets:                                             1196161

With 399 as the ldr:

Traceback (most recent call last):
  File "./ldpred/LDpred.py", line 292, in <module>
    main()
  File "./ldpred/LDpred.py", line 279, in main
    LDpred_gibbs.main(p_dict)
  File "/u/project/arboleda/joshuazh/ldpred/ldpred/LDpred_gibbs.py", line 366, in main
    h2=p_dict['h2'], verbose=p_dict['debug'], summary_dict=summary_dict)
  File "/u/project/arboleda/joshuazh/ldpred/ldpred/LDpred_gibbs.py", line 186, in ldpred_genomewide
    herit_dict = ld.get_chromosome_herits(cord_data_g, ld_scores_dict, n, h2=h2, debug=verbose,summary_dict=summary_dict)
  File "/u/project/arboleda/joshuazh/ldpred/ldpred/ld.py", line 315, in get_chromosome_herits
    assert chi_square_lambda>1, 'Something is wrong with the GWAS summary statistics, parsing of them, or the given GWAS sample size (N). Lambda (the mean Chi-square statistic) is too small.  '
AssertionError: Something is wrong with the GWAS summary statistics, parsing of them, or the given GWAS sample size (N). Lambda (the mean Chi-square statistic) is too small. 

EDIT3:

It looks like it might be the hapmap3 file not properly recognizing SIDs of chromosome 23?

Best, Josh

bvilhjal commented 5 years ago

Hi, I thought I had fixed it. Will look at later this week. I'm sorry about this.

Best, Bjarni

Ambrosinae commented 5 years ago

It's no problem, everything still works if I just remove chr23 from the files.

Josh

bvilhjal commented 4 years ago

I believe parsing and working with X chromosomes already works, so I'll close this.