cggh / scikit-allel

A Python package for exploring and analysing genetic variation data
MIT License
287 stars 49 forks source link

windowed_r_squared: IndexError for single-SNP windows #337

Open pmonnahan opened 4 years ago

pmonnahan commented 4 years ago

Hi,

I have been trying to implement the windowed_r_squared function, and I believe that I've come across a bug related to windows containing a single SNP. Below is a snippet generated via pdb, showing that the error arises within the 'windowed_statistic' function. I noticed in the source code of 'windowed_statistic' that there is an 'if n==0' catch and am wondering if this needs to be generalized for r2 calculations where a pair of observations is required at a minimum.

I could increase window size or try to identify and remove these windows beforehand, but I thought I would check whether this is a known (or non-) issue on your end.

r2, windows, counts = allel.windowed_r_squared(pos, ct, size=winsize, start=start, stop=stop, step=int(winsize / 2), fill=-9)` (Pdb) s --Call-- /home/spectorl/pmonnaha/.conda/envs/py36/lib/python3.6/site-packages/allel/stats/ld.py(161)windowed _r_squared() -> def windowed_r_squared(pos, gn, size=None, start=None, stop=None, step=None, (Pdb) /home/spectorl/pmonnaha/.conda/envs/py36/lib/python3.6/site-packages/allel/stats/ld.py(214)windowed _r_squared() -> if isinstance(percentile, (list, tuple)): (Pdb) /home/spectorl/pmonnaha/.conda/envs/py36/lib/python3.6/site-packages/allel/stats/ld.py(222)windowed _r_squared() -> def statistic(gnw): (Pdb) /home/spectorl/pmonnaha/.conda/envs/py36/lib/python3.6/site-packages/allel/stats/ld.py(226)windowe$ _r_squared()

... After about 5 windows with error-free calculation, I encounter this:

return windowed_statistic(pos, gn, statistic, size, start=start, (Pdb) s wv = values[start_idx:stop_idx] (Pdb) n /home/spectorl/pmonnaha/.conda/envs/py36/lib/python3.6/site-packages/allel/stats/window.py(366)windowed_statistic() s = statistic(wv) (Pdb) print(n) 1 (Pdb) print(wv) [[0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 2 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0]] (Pdb) wv.shape (1, 99) (Pdb) n IndexError: cannot do a non-empty take from an empty axes. /home/spectorl/pmonnaha/.conda/envs/py36/lib/python3.6/site-packages/allel/stats/window.py(366)windowed_statistic() -> s = statistic(wv)

Thanks!

hardingnj commented 4 years ago

Thanks @pmonnahan .

This looks to reproduce with:

import allel
import numpy as np

pos = np.random.choice(100_000, 100, replace=False)

gn = np.random.randint(0, 3, (100, 100))

r2, windows, counts = allel.windowed_r_squared(pos, gn, size=5000, start=1, stop=100_000, fill=-9)

And this line (equivalent) causes the error:

np.percentile(np.array([]), 50)

I guess a check for the shape of gnw is the best option here.

pmonnahan commented 4 years ago

Thanks @hardingnj for the reprex and marking for change. Also, thanks to you and others for such a great package.