brielin / Popcorn

Software for estimating correlation of trait effect sizes across populations
38 stars 15 forks source link

Assertion Error when calling "pysnptools" #14

Closed YunfengRuan closed 4 years ago

YunfengRuan commented 4 years ago

I want to calculate the genetic correlation between a AFR sample and a EAS sample. When I calculate the score with AFR and EAS samples from 1000 genomes, I keep got the following error: the command is: popcorn compute -v 1 --bfile1 AFRbfile --bfile2 EASbfile score.AFR-EAS And the error message with a little bit output before that is the following :

39832303 Variants in file 1
22263023 Variants in file 2
7859770 SNPs in file 1 after MAF and indel filter
5405806 SNPs in file 2 after MAF and indel filter
4013034 SNPs common in both populations
Traceback (most recent call last):
  File "/home/unix/yruan/Popcorn/env/bin/popcorn", line 11, in <module>
    load_entry_point('popcorn==0.9.9', 'console_scripts', 'popcorn')()
  File "/home/unix/yruan/Popcorn/env/lib/python2.7/site-packages/popcorn-0.9.9-py2.7.egg/popcorn/__main__.py", line 234, in main
    scores = compute.covariance_scores_2_pop(args)
  File "/home/unix/yruan/Popcorn/env/lib/python2.7/site-packages/popcorn-0.9.9-py2.7.egg/popcorn/compute.py", line 285, in __init__
    bed_1_index = np.sort(bed_1.sid_to_index(snps_to_use)) #
  File "/home/unix/yruan/Popcorn/env/lib/python2.7/site-packages/pysnptools/snpreader/snpreader.py", line 496, in sid_to_index
    return self.col_to_index(list)
  File "/home/unix/yruan/Popcorn/env/lib/python2.7/site-packages/pysnptools/pstreader/pstreader.py", line 507, in col_to_index
    assert len(col_set) == self.col_count, "Expect col to appear in data only once."
AssertionError: Expect col to appear in data only once.

I tried version 0.3.9 and 0.3.11 of pysnptools, and the error message is just the same. Do you have any idea of how to debug?

Thank you so much for your help.

brielin commented 4 years ago

I'm not exactly sure. Based on the text it's possible you have either duplicate individuals or duplicate SNPs. There should be some filtering in place in the code but you could try manually filtering prior to running this.

YunfengRuan commented 4 years ago

Thank you very much for your suggestion. It turns out that I have a duplicate SNP in the bfiles. The software runs OK after that SNP is removed.

Best Regards

Sincerely yours, Yunfeng Ruan

On Mon, Sep 21, 2020 at 8:47 AM Brielin Brown notifications@github.com wrote:

I'm not exactly sure. Based on the text it's possible you have either duplicate individuals or duplicate SNPs. There should be some filtering in place in the code but you could try manually filtering prior to running this.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/brielin/Popcorn/issues/14#issuecomment-696090919, or unsubscribe https://github.com/notifications/unsubscribe-auth/AECVF62D5R7OYD2OSJZUFATSG5DNXANCNFSM4RIIKMVA .