aristoteleo / dynamo-release

Inclusive model of expression dynamics with conventional or metabolic labeling based scRNA-seq / multiomics, vector field reconstruction and differential geometry analyses
https://dynamo-release.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
415 stars 59 forks source link

Issue with arrays with dyn.pd.least_action #394

Closed maven0708 closed 1 year ago

maven0708 commented 2 years ago

Hello! I am following the LAP tutorial, and keep running into an issue with dyn.pd.least_action. Please let me know if I should provide more information, and I hope there is an easy solution. I did have to change the "adj_key" for each to match the obsp that I had in my object. I get the following output (and error) when I try to run it:

|-----> [iterating through 1 pairs] in progress: 100.0000%
|-----> [iterating through 1 pairs] finished [4.9063s]
|-----> [iterating through 1 pairs] in progress: 100.0000%
|-----> [iterating through 1 pairs] finished [112.9607s]
|-----> [iterating through 1 pairs] in progress: 100.0000%
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [60], in <cell line: 11>()
     13 if start is not end:
     14     min_lap_t = True if i == 0 else False
---> 15     dyn.pd.least_action(
     16         adata_labeling,
     17         [adata_labeling.obs_names[start[0]][0]],
     18         [adata_labeling.obs_names[end[0]][0]],
     19         basis="umap",
     20         adj_key="umap_distances", # changed
     21         min_lap_t= min_lap_t,
     22         EM_steps=2,
     23     )
     24     dyn.pl.least_action(adata_labeling, basis="umap")
     25     lap = dyn.pd.least_action(
     26         adata_labeling,
     27         [adata_labeling.obs_names[start[0]][0]],
   (...)
     32         EM_steps=2,
     33     )

File ~/Library/Python/3.9/lib/python/site-packages/dynamo/prediction/least_action_path.py:283, in least_action(adata, init_cells, target_cells, init_states, target_states, paired, min_lap_t, elbow_method, num_t, basis, vf_key, vecfld, adj_key, n_points, init_paths, D, PCs, expr_func, add_key, **kwargs)
    278 logger.info(
    279     "initializing path with the shortest path in the graph built from the velocity transition matrix...",
    280     indent_level=2,
    281 )
    282 if init_paths is None:
--> 283     init_path = get_init_path(G, init_state, target_state, coords, interpolation_num=n_points)
    284 else:
    285     init_path = init_paths if type(init_paths) == np.ndarray else init_paths[path_ind]

File ~/Library/Python/3.9/lib/python/site-packages/dynamo/prediction/least_action_path.py:132, in get_init_path(G, start, end, coords, interpolation_num)
    127 init_path = coords[path, :]
    129 # _, arclen, _ = remove_redundant_points_trajectory(init_path, tol=1e-4, output_discard=True)
    130 # arc_stepsize = arclen / (interpolation_num - 1)
    131 # init_path_final, _, _ = arclength_sampling(init_path, step_length=arc_stepsize, t=np.arange(len(init_path)))
--> 132 init_path_final, _, _ = arclength_sampling_n(init_path, interpolation_num, t=np.arange(len(init_path)))
    134 # add the beginning and end point
    135 init_path_final = np.vstack((start, init_path_final, end))

File ~/Library/Python/3.9/lib/python/site-packages/dynamo/prediction/utils.py:484, in arclength_sampling_n(X, num, t)
    481 arclen = np.hstack((0, arclen))
    483 z = np.linspace(arclen[0], arclen[-1], num)
--> 484 X_ = interpolate.interp1d(arclen, X, axis=0)(z)
    485 if t is not None:
    486     t_ = interpolate.interp1d(arclen, t)(z)

File /usr/local/lib/python3.9/site-packages/scipy/interpolate/_interpolate.py:572, in interp1d.__init__(***failed resolving arguments***)
    569         self._call = self.__class__._call_spline
    571 if len(self.x) < minval:
--> 572     raise ValueError("x and y arrays must have at "
    573                      "least %d entries" % minval)

ValueError: x and y arrays must have at least 2 entries

If it helps, here is the object I am working with:

AnnData object with n_obs × n_vars = 3288 × 32285
    obs: 'nCount_ambiguous', 'nFeature_ambiguous', 'nCount_matrix', 'nFeature_matrix', 'nCount_spliced', 'nFeature_spliced', 'nCount_unspliced', 'nFeature_unspliced', 'nCount_RNA', 'nFeature_RNA', 'S.Score', 'G2M.Score', 'Phase', 'old.ident', 'nCount_SCT', 'nFeature_SCT', 'SCT_snn_res.0.8', 'seurat_clusters', 'SCT_snn_res.0.7', 'SCT_snn_res.0.3', 'nGenes', 'nCounts', 'pMito', 'pass_basic_filter', 'unspliced_Size_Factor', 'initial_unspliced_cell_size', 'Size_Factor', 'initial_cell_size', 'spliced_Size_Factor', 'initial_spliced_cell_size', 'ntr', 'jaccard_velocity_confidence', 'umap_ddhodge_div', 'umap_ddhodge_potential', 'curl_umap', 'divergence_umap', 'control_point_umap', 'inlier_prob_umap', 'obs_vf_angle_umap', 'control_point_pca', 'inlier_prob_pca', 'obs_vf_angle_pca'
    var: 'features', 'RNA_features', 'ambiguous_features', 'spliced_features', 'unspliced_features', 'nCells', 'nCounts', 'pass_basic_filter', 'log_cv', 'score', 'log_m', 'frac', 'use_for_pca', 'ntr', 'beta', 'gamma', 'half_life', 'alpha_b', 'alpha_r2', 'gamma_b', 'gamma_r2', 'gamma_logLL', 'delta_b', 'delta_r2', 'bs', 'bf', 'uu0', 'ul0', 'su0', 'sl0', 'U0', 'S0', 'total0', 'use_for_dynamics', 'use_for_transition', 'traj_msd'
    uns: 'pp', 'velocyto_SVR', 'PCs', 'explained_variance_ratio_', 'pca_mean', 'pca_fit', 'feature_selection', 'dynamics', 'neighbors', 'grid_velocity_umap', 'grid_velocity_pca', 'VecFld_umap', 'umap_neighbors', 'LAP_umap', 'VecFld_pca', 'LAP_pca', 'kinetics_heatmap'
    obsm: 'X_umap', 'X_pca', 'X', 'velocity_umap', 'velocity_pca', 'velocity_umap_SparseVFC', 'X_umap_SparseVFC', 'velocity_pca_SparseVFC', 'X_pca_SparseVFC'
    layers: 'RNA', 'ambiguous', 'spliced', 'unspliced', 'X_unspliced', 'X_spliced', 'M_u', 'M_uu', 'M_s', 'M_us', 'M_ss', 'velocity_S'
    obsp: 'moments_con', 'distances', 'connectivities', 'pearson_transition_matrix', 'umap_ddhodge', 'umap_distances', 'umap_connectivities'
Xiaojieqiu commented 2 years ago

thanks for this question. Looks to me somehow the step of uniformly sampling along the initial path is failed because either arclen or X has only one element.

Have you tried passing two different start and end cells? Do you still have the same issue? otherwise, providing me some sample data that can reproduce this bug will be helpful

maven0708 commented 2 years ago

Thank you for your reply! Sorry for my delay in getting back to this. I have tried a number of ways to modify this, including your suggestion. I believe the issue might be in how the fixed_points are chosen? I'm not quite sure I understand that part of it. Specifically this line of code:

fixed_points = np.array(
    [
        [8.45201833, 9.37697661],
        [14.00630381, 2.53853712],
        [17.30550636, 6.81561775],
        [18.06891717, 11.9840678],
        [14.13613403, 15.22244713],
        [9.72644402, 14.83745969],
    ]
)

Thank you!

Edit: as an update, I realized one of my fixed_points was off the screen and when I corrected that I was able to continue in my analysis without the numpy error. But my question still remains of how to identify the best fixed points? Is there a command, or is it based off subjective choosing? Thank you!

chansigit commented 2 years ago

Thank you for your reply! Sorry for my delay in getting back to this. I have tried a number of ways to modify this, including your suggestion. I believe the issue might be in how the fixed_points are chosen? I'm not quite sure I understand that part of it. Specifically this line of code:

fixed_points = np.array(
    [
        [8.45201833, 9.37697661],
        [14.00630381, 2.53853712],
        [17.30550636, 6.81561775],
        [18.06891717, 11.9840678],
        [14.13613403, 15.22244713],
        [9.72644402, 14.83745969],
    ]
)

Thank you!

Edit: as an update, I realized one of my fixed_points was off the screen and when I corrected that I was able to continue in my analysis without the numpy error. But my question still remains of how to identify the best fixed points? Is there a command, or is it based off subjective choosing? Thank you!

you may run dyn.pl.topography first to find the fixed points, and then get them by checking

print(adata.uns['VecFld_umap']['Xss'])

# -1 -- stable
#  0 -- saddle
#  1 -- unstable
print(adata.uns['VecFld_umap']['ftype'] )

whose order corresponds to the fixed points in hte topography figure.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 14 days