dpeerlab / Palantir

Single cell trajectory detection
https://palantir.readthedocs.io
GNU General Public License v2.0
203 stars 45 forks source link

how to choose “start_cell” #128

Closed Xyihang closed 6 months ago

Xyihang commented 7 months ago

start_cell = "Run5_164698952452459" pr_res = palantir.core.run_palantir( ad, start_cell, num_waypoints=500, terminal_states=terminal_states )

"Hello, I am wondering if there are any tools or principles that can assist us in automating or semi-automating the selection of a start_cell? How do we evaluate and choose between the final results of different start_cells, and are there any metrics that can reflect this?"

katosh commented 7 months ago

Hi @Xyihang,

There is indeed a way to use the diffusion components for a semi-automated selection of a start cell. The general idea is to find a component that peaks in the most stem-like cell type and then select the cell at this peak. This cell is thought of as being the most extrem stem-like cell in the dataset. E.g., if the most stem-like cell type of your dataset is HSC and the cell types are annotated in ad.obs["celltype"] then you can run:

start_cell = palantir.utils.early_cell(ad, "HSC", "celltype")

Pleas let me know if this helps.

katosh commented 6 months ago

Note that the aforementioned approach does not succeed if there is no diffusion component reaching its maximum or minimum in the specified cell type. This may indicate either the absence of a suitable starting point in the cell type or that too few diffusion components were computed. It is generally advisable to reassess the selection of the starting cell type or to rerun palantir.utils.run_diffusion_maps with an increased value for n_components.

However, if you wish to force Palantir to select a cell within the specified type, the fallback_seed argument can be utilized. This enables the function to default to an algorithm that selects a starting cell, which, however, is not guaranteed to be optimal. With the specified random seed, it will choose a random cell outside the specified cell type, and compute pseudotime starting from this cell. Then, it selects the "latest" cell within the specified cell type as the starting cell. This approach yields a good starting cell if the specified cell type represents an extreme point in the dataset; otherwise, it selects a random cell:

start_cell = palantir.utils.early_cell(ad, "HSC", "celltype", fallback_seed=236)

Please reopen this thread if this does not resolve your issue.