Open schimar opened 4 years ago
Hi Martin.
This does seem a bit strange. Might be worth investigating a little further what exactly gnu
is.
It is a numpy array it seems. What's gnu.dtype
and gnu.shape
?
Can you share the snippet where gnu
is created?
Ah- please ignore what I wrote above. This error seems to be thrown when a variant is not segregating. ie all values in a row are identical. I appreciate the error could be clearer here, but removing invariant rows should fix your problem.
of course, that makes total sense! for future reference (if needed) the following does the trick:
is_segAll = ac_subpops['all'].is_segregating()[:]
# with ac_subpops['all'] being the allele counts
gtsubseg = gtsub.compress(is_segAll, axis=0)
def ld_prune(gn, size, step, threshold=.1, n_iter=1):
for i in range(n_iter):
loc_unlinked = al.locate_unlinked(gn, size=size, step=step, threshold=threshold)
n = np.count_nonzero(loc_unlinked)
n_remove = gn.shape[0] - n
print('iteration', i+1, 'retaining', n, 'removing', n_remove, 'variants')
gn = gn.compress(loc_unlinked, axis=0)
return gn
gnu = ld_prune(nAltSub, size=200, step=50, threshold=.1, n_iter=5)
coords1, model1 = al.pca(gnu, n_components=10, scaler='patterson')
Now, the pca runs fine. thanks so much, martin
Just to add, there is a rare edge case where this error can still arise even for a segregating variant, if the variant has a heterozygous genotype in all individuals. In this case, although the variant is segregating, all individuals have the same genotype, and so after converting the data via GenotypeArray.to_n_alt()
all values in the corresponding row are the same.
A workaround for this is something like this, where you compare all values to the first value in each row:
gt = ... # some genotype array
gn = gt.to_n_alt()
is_informative = np.any(gn[:, 0, np.newaxis] != gn, axis=1)
gn_informative = gn.compress(is_informative, axis=0)
Hi,
With allel version 1.2.1 and python3.7, I am trying to perform a PCA (using SVD) with my data, and I pretty much followed the "Fast-PCA" post on alimanfoo.github.io, where I do:
coords1, model1 = al.pca(gnu, n_components=10, scaler='patterson')
This returns (in short):
I checked, and
gnu
seems to be of type integer:np.isfinite(gnu).all()
returnsTrue
andnp.isnan(gnu).any()
returnsFalse
When converting the contents of
gnu
to float, with:Now, both
isinstance(gnufl.flat[0], np.float)
andnp.isfinite(gnufl).all()
returnTrue
, andnp.isnan(gnufl).any()
returns 'False'So, no NaNs and no infs in there, however, the error is the same as before.
What am I missing?
I'd really appreciate your help, thanks a lot & all the best, martin
PS: Just in case, here's the output of a (what I think should do a) check for tyoe of all values in gnufl