emdann / milopy

Python implementation of Milo for differential abundance testing on KNN graph
MIT License
60 stars 7 forks source link

make_nhoods Error: Mean of empty slice. #43

Open jflusche opened 1 year ago

jflusche commented 1 year ago

Hi Emma,

Really looking forward to trying the Milo implementation in Python.

When I run: milo.make_nhoods(adata_LP_ILE, prop=0.1)

I get the following error:

/home/jupyter/.local/lib/python3.7/site-packages/numpy/core/fromnumeric.py:3441: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
/home/jupyter/.local/lib/python3.7/site-packages/numpy/core/_methods.py:182: RuntimeWarning: invalid value encountered in true_divide
  ret, rcount, out=ret, casting='unsafe', subok=False)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_193/3513822349.py in <module>
      1 ## Assign cells to neighbourhoods
----> 2 milo.make_nhoods(adata_LP_ILE, prop=1) #default prop=0.1

~/.local/lib/python3.7/site-packages/milopy/core.py in make_nhoods(adata, neighbors_key, prop, seed)
     92         # Find closest real point (amongst nearest neighbors)
     93         dists = euclidean_distances(
---> 94             X_dimred[non_zero_cols[non_zero_rows == i], :], nh_pos.T)
     95         # Update vertex index
     96         refined_vertices[i] = nn_ixs[dists.argmin()]

/opt/conda/lib/python3.7/site-packages/sklearn/metrics/pairwise.py in euclidean_distances(X, Y, Y_norm_squared, squared, X_norm_squared)
    300            [1.41421356]])
    301     """
--> 302     X, Y = check_pairwise_arrays(X, Y)
    303 
    304     if X_norm_squared is not None:

/opt/conda/lib/python3.7/site-packages/sklearn/metrics/pairwise.py in check_pairwise_arrays(X, Y, precomputed, dtype, accept_sparse, force_all_finite, copy)
    160             copy=copy,
    161             force_all_finite=force_all_finite,
--> 162             estimator=estimator,
    163         )
    164         Y = check_array(

/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
    806                 "Found array with %d sample(s) (shape=%s) while a"
    807                 " minimum of %d is required%s."
--> 808                 % (n_samples, array.shape, ensure_min_samples, context)
    809             )
    810 

ValueError: Found array with 0 sample(s) (shape=(0, 30)) while a minimum of 1 is required by check_pairwise_arrays.

I would like Milo to use the previously calculated KNN graph and connectivities based on the scVI reduced dimension space, so I skipped recalculating them based on PCA. When I recalculated neighbors based on PCA, Milo ran without errors. To try to troubleshoot where the error was coming from when using the scVI-based neighbors, I ran through make_nhoods line-by-line. I noticed there were empty arrays at certain indices of non_zero_rows. This happened at random indices (the first two times being at indices 196 and 351). The function ran normally for other indices.

For example, at index 196, this line was producing an array of nan values and the 'Mean of empty slice' error:

nh_pos = np.median(
        X_dimred[non_zero_cols[non_zero_rows == 196], :], 0).reshape(-1, 1)

Thank you for your help!

emdann commented 1 year ago

Hi @jflusche, you should be able to run milopy using a neighbor graph built on the scVI embedding with no problem, the function picks the same representation used for KNN graph construction (see here).

Is your KNN graph based on scVI dimensions stored in adata.uns['neighbors'] or under another name? If the name is different you might need to pass a neighbors_key. Otherwise, could this be related to https://github.com/emdann/milopy/issues/34?

jflusche commented 1 year ago

Thank you for the prompt response! I looked into your suggestions for this issue, but have not seemed to find a solution yet.

Here are my findings:

1) I confirmed that the KNN graph is stored in adata.uns['neighbors'] (the correct name). 2) I attempted to use the 'quick fixes' from #34 but found was that X_dimred was already an array type (not a dataframe), so this solution did not change the object X_dimred (or the error I get in make_nhoods).