cggh / scikit-allel

A Python package for exploring and analysing genetic variation data
MIT License
288 stars 51 forks source link

ValueError when running xpehh #91

Closed sa-lee closed 8 years ago

sa-lee commented 8 years ago

Hi there, I'm trying to run xpehh on a set of variants. I created two haplotype arrays for each population and then ran xpehh with the default settings and get the following error

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-23-7ebd03dbde93> in <module>()
      5     positions = variants_seg[i]['POS'][:]
      6     print h1.shape, h2.shape, len(pos), pos.dtype
----> 7     xpehh_out.append(allel.stats.selection.xpehh(hap1, hap2, positions))
      8     break
      9 

/Users/lee.s/anaconda/envs/py2/lib/python2.7/site-packages/allel/stats/selection.pyc in xpehh(h1, h2, pos, min_ehh)
    297 
    298     # scan forward
--> 299     ihh1_fwd = ihh_scan_int8(h1, pos, min_ehh=min_ehh)
    300     ihh2_fwd = ihh_scan_int8(h2, pos, min_ehh=min_ehh)
    301 

allel/opt/stats.pyx in allel.opt.stats.ihh_scan_int8 (allel/opt/stats.c:8568)()

allel/opt/stats.pyx in allel.opt.stats.ssl2ihh (allel/opt/stats.c:7812)()

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Any ideas on how to fix or what's going wrong?

alimanfoo commented 8 years ago

Hi Stuart,

It looks like there is a mistake in this line in allel/opt/stats.pyx:

    elif (ehh.size < i) | (ehh[0] <= min_ehh):

...the '|' should be 'or'. Could you try running xpehh(hap1, hap2, positions, min_ehh=None) just to see what happens? That should bypass the code with the error.

Sorry for this, the xpehh and ihs functions haven't been as well tested as they should, it's hard to write good test cases for them but I should certainly make a better effort.

On Tuesday, June 28, 2016, Stuart Lee notifications@github.com wrote:

Hi there, I'm trying to run xpehh on a set of variants. I created two haplotype arrays for each population and then ran xpehh with the default settings and get the following error

---------------------------------------------------------------------------ValueError Traceback (most recent call last) in () 5 positions = variants_seg[i]['POS'][:] 6 print h1.shape, h2.shape, len(pos), pos.dtype----> 7 xpehh_out.append(allel.stats.selection.xpehh(hap1, hap2, positions)) 8 break 9 /Users/lee.s/anaconda/envs/py2/lib/python2.7/site-packages/allel/stats/selection.pyc in xpehh(h1, h2, pos, min_ehh) 297 298 # scan forward--> 299 ihh1_fwd = ihh_scan_int8(h1, pos, min_ehh=min_ehh) 300 ihh2_fwd = ihh_scan_int8(h2, pos, min_ehh=min_ehh) 301

allel/opt/stats.pyx in allel.opt.stats.ihh_scan_int8 (allel/opt/stats.c:8568)()

allel/opt/stats.pyx in allel.opt.stats.ssl2ihh (allel/opt/stats.c:7812)() ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Any ideas on how to fix or what's going wrong?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cggh/scikit-allel/issues/91, or mute the thread https://github.com/notifications/unsubscribe/AAq8QiDxS20fy3K9WByyvGx7KX7IYhrEks5qQMQOgaJpZM4I_ytN .

Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Email: alimanfoo@googlemail.com alimanfoo@gmail.com Web: http://purl.org/net/aliman Twitter: https://twitter.com/alimanfoo Tel: +44 (0)1865 287721

sa-lee commented 8 years ago

Hi Alistair, Thanks for your help.

I still get an error when running it with min_ehh = None. Running iHS using the same Haplotype Arrays seems to work fine.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-14-b1ab8739ce1b> in <module>()
      4     hap2 = allel.HaplotypeArray(haplotypes[i].subset(sel1 = ca_subpops['dr_of_the_congo']))
      5     positions = variants_seg[i]['POS'][:]
----> 6     xpehh_out.append(allel.stats.selection.xpehh(hap1, hap2, positions, min_ehh = None))
      7 

/Users/lee.s/anaconda/envs/py2/lib/python2.7/site-packages/allel/stats/selection.pyc in xpehh(h1, h2, pos, min_ehh)
    297 
    298     # scan forward
--> 299     ihh1_fwd = ihh_scan_int8(h1, pos, min_ehh=min_ehh)
    300     ihh2_fwd = ihh_scan_int8(h2, pos, min_ehh=min_ehh)
    301 

allel/opt/stats.pyx in allel.opt.stats.ihh_scan_int8 (allel/opt/stats.c:8568)()

allel/opt/stats.pyx in allel.opt.stats.ssl2ihh (allel/opt/stats.c:7955)()

TypeError: 'int' object is unsliceable
alimanfoo commented 8 years ago

Not immediately obvious what's causing this error, I need to debug. If you are in a position to build scikit-allel from source you could try making the fix I mentioned previously and see if that gets you anywhere. I'd like to work on this but probably won't have any time until next week at least.

FWIW I actually want to revisit the XPEHH and IHS code in the near future anyway, they currently don't handle gaps in the reference genome as do other implementations like hapbin, and there are a couple of minor optimisations that could be applied.

On Tue, Jun 28, 2016 at 10:11 AM, Stuart Lee notifications@github.com wrote:

Hi Alistair, Thanks for your help.

I still get an error when running it with min_ehh = None. Running iHS using the same Haplotype Arrays seems to work fine.

---------------------------------------------------------------------------TypeError Traceback (most recent call last) in () 4 hap2 = allel.HaplotypeArray(haplotypes[i].subset(sel1 = ca_subpops['dr_of_the_congo'])) 5 positions = variants_seg[i]['POS'][:]----> 6 xpehh_out.append(allel.stats.selection.xpehh(hap1, hap2, positions, min_ehh = None)) 7 /Users/lee.s/anaconda/envs/py2/lib/python2.7/site-packages/allel/stats/selection.pyc in xpehh(h1, h2, pos, min_ehh) 297 298 # scan forward--> 299 ihh1_fwd = ihh_scan_int8(h1, pos, min_ehh=min_ehh) 300 ihh2_fwd = ihh_scan_int8(h2, pos, min_ehh=min_ehh) 301

allel/opt/stats.pyx in allel.opt.stats.ihh_scan_int8 (allel/opt/stats.c:8568)()

allel/opt/stats.pyx in allel.opt.stats.ssl2ihh (allel/opt/stats.c:7955)() TypeError: 'int' object is unsliceable

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/cggh/scikit-allel/issues/91#issuecomment-228994853, or mute the thread https://github.com/notifications/unsubscribe/AAq8QlItgdnfBAnGaEDpJiIc2n2vXf0fks5qQOU7gaJpZM4I_ytN .

Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Email: alimanfoo@googlemail.com alimanfoo@gmail.com Web: http://purl.org/net/aliman Twitter: https://twitter.com/alimanfoo Tel: +44 (0)1865 287721

sa-lee commented 8 years ago

Unfortunately not at the moment - I'll try building from source on my personal laptop and make the changes you suggest.

I'm currently using selscan to do this type of analysis but I like your package because it can do everything in the same place, so any improvements you can make would be greatly appreciated.

alimanfoo commented 8 years ago

Thanks, good to know. This general area of haplotype-based tests for selection is something I'm keen to develop and improve in scikit-allel, would be good to stay in touch.

On Wed, Jun 29, 2016 at 2:26 AM, Stuart Lee notifications@github.com wrote:

Unfortunately not at the moment - I'll try building from source on my personal laptop and make the changes you suggest.

I'm currently using selscan to do this type of analysis but I like your package because it can do everything in the same place, so any improvements you can make would be greatly appreciated.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/cggh/scikit-allel/issues/91#issuecomment-229231688, or mute the thread https://github.com/notifications/unsubscribe/AAq8QtWoO6yDl4ByIPgWOgfIRrVSrsCgks5qQcm5gaJpZM4I_ytN .

Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Email: alimanfoo@googlemail.com alimanfoo@gmail.com Web: http://purl.org/net/aliman Twitter: https://twitter.com/alimanfoo Tel: +44 (0)1865 287721

alimanfoo commented 8 years ago

Resolved in #92.