dpeerlab / Palantir

Single cell trajectory detection
https://palantir.readthedocs.io
GNU General Public License v2.0
203 stars 45 forks source link

"ValueError: Length of values does not match length of index" #143

Closed schroeme closed 3 weeks ago

schroeme commented 3 weeks ago

Hi! I am running Palantir as below:

start_cell = root_bc
pr_res = palantir.core.run_palantir(
    adata_ol_ctx, start_cell,knn=30, num_waypoints=500,terminal_states=terminal_states
)

and am getting the following error:

Sampling and flocking waypoints...
Time for determining waypoints: 0.010865215460459392 minutes
Determining pseudotime...
Shortest path distances using 30-nearest neighbor graph...
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[82], line 4
      1 #Run palentir, specify a starting cell first
      3 start_cell = root_bc
----> 4 pr_res = palantir.core.run_palantir(
      5     adata_ol_ctx, start_cell,knn=30, num_waypoints=500,terminal_states=terminal_states
      6 )

File ~/anaconda3/envs/palantir/lib/python3.9/site-packages/palantir/core.py:156, in run_palantir(data, early_cell, terminal_states, knn, num_waypoints, n_jobs, scale_components, use_early_cell_as_start, max_iterations, eigvec_key, pseudo_time_key, entropy_key, fate_prob_key, save_as_df, waypoints_key, seed)
    154 # pseudotime and weighting matrix
    155 print("Determining pseudotime...")
--> 156 pseudotime, W = _compute_pseudotime(
    157     data_df, start_cell, knn, waypoints, n_jobs, max_iterations
    158 )
    160 # Entropy and branch probabilities
    161 print("Entropy and branch probabilities...")

File ~/anaconda3/envs/palantir/lib/python3.9/site-packages/palantir/core.py:264, in _compute_pseudotime(data, start_cell, knn, waypoints, n_jobs, max_iterations)
    261 adj = nbrs.kneighbors_graph(data, mode="distance")
    263 # Connect graph if it is disconnected
--> 264 adj = _connect_graph(adj, data, np.where(data.index == start_cell)[0][0])
    266 # Distances
    267 dists = Parallel(n_jobs=n_jobs, max_nbytes=None)(
    268     delayed(_shortest_path_helper)(np.where(data.index == cell)[0][0], adj)
    269     for cell in waypoints
    270 )

File ~/anaconda3/envs/palantir/lib/python3.9/site-packages/palantir/core.py:572, in _connect_graph(adj, data, start_cell)
    567 # Compute distances to unreachable nodes
    568 unreachable_dists = pairwise_distances(
    569     data.iloc[farthest_reachable, :].values.reshape(1, -1),
    570     data.loc[unreachable_nodes, :],
    571 )
--> 572 unreachable_dists = pd.Series(
    573     np.ravel(unreachable_dists), index=unreachable_nodes
    574 )
    576 # Add edge between farthest reacheable and its nearest unreachable
    577 add_edge = np.where(data.index == unreachable_dists.idxmin())[0][0]

File ~/anaconda3/envs/palantir/lib/python3.9/site-packages/pandas/core/series.py:500, in Series.__init__(self, data, index, dtype, name, copy, fastpath)
    498     index = default_index(len(data))
    499 elif is_list_like(data):
--> 500     com.require_length_match(data, index)
    502 # create/copy the manager
    503 if isinstance(data, (SingleBlockManager, SingleArrayManager)):

File ~/anaconda3/envs/palantir/lib/python3.9/site-packages/pandas/core/common.py:576, in require_length_match(data, index)
    572 """
    573 Check the length of data matches the length of the index.
    574 """
    575 if len(data) != len(index):
--> 576     raise ValueError(
    577         "Length of values "
    578         f"({len(data)}) "
    579         "does not match length of index "
    580         f"({len(index)})"
    581     )

ValueError: Length of values (5005) does not match length of index (5001)

adata_ol_ctx has shape 8975 × 256 and ms_data has shape (8975, 2). I will note that rather than gene symbols, each adata.var has a name from 1 to 256 as strings (there are 256 features). The MAGIC imputation ran fine. Might I be violating some other input requirements? Can you please advise?

Thanks, Margaret

katosh commented 3 weeks ago

Hello @schroeme,

Thank you for reporting! What is the content of terminal_states? This might help us trace the cause of this issue.

schroeme commented 3 weeks ago

The content of terminal_states is bi006.pfcm.rxn1_GCTTGGGCACATTACG MOL dtype: object

Thanks!

I'll also note that if I plot that cell's coordinate on the UMAP, it is in the correct spot and in adata_ol_ctx.

katosh commented 3 weeks ago

Hi @schroeme,

I can't determine the data type of your terminal_states. It appears similar to a pandas Series but lacks an index. Please ensure terminal_states is one of the supported data types (see the documentation: Palantir Core).

For example, you could use a dictionary:

terminal_states = {
    "your_branch_name": "bi006.pfcm.rxn1_GCTTGGGCACATTACG",
}

Edit: I think I misread your post, and terminal_states is indeed a pandas series, like terminal_states = pd.Series({"bi006.pfcm.rxn1_GCTTGGGCACATTACG":"MOL"}). However, it seems the cell name is in the index and the branch name is the value of the Series. Try flipping the two as suggested above.

schroeme commented 3 weeks ago

Hi @katosh, sorry about that. terminal_states is a pandas Series, and the index is the barcode. Additionally, if I remove terminal_states=terminal_states, the command still fails with the same error.

katosh commented 3 weeks ago

Thank you for the clarification. Another reason for this error could be non-unique cell barcodes. If multiple cells in your anndata adata_ol_ctx have the same name. You could inspect, e.g., adata_ol_ctx.obs_names.value_counts() to see if any names come up more than one time. If this is the case, then you could make them unique with adata_ol_ctx .obs_names_make_unique() or consider remove if they are duplicates of the same cell.

Please let me know if this helps or if you have any further questions!

schroeme commented 3 weeks ago

That fixed the issue, thanks so much!