dpeerlab / Palantir

Single cell trajectory detection
https://palantir.readthedocs.io
GNU General Public License v2.0
212 stars 49 forks source link

Multiple root cells possible? #146

Open TdzBAS opened 2 months ago

TdzBAS commented 2 months ago

Hi @flying-sheep ,

Is it possible to set multiple root cells before running pseudotime calculation with palantir? I have multiple root stem cells in my tissue/data.

Best, T

flying-sheep commented 1 month ago

I’m not a member of @dpeerlab, so what’s the case

  1. the original API allows it while scanpy’s external wrapper doesn’t allow it and needs to be updated (then this should be a scanpy issue instead of here)
  2. the palantir library itself doesn’t allow it (then it should be here, but I shouldn’t be tagged)
katosh commented 1 month ago

Hi all! I was confused about the tag too but am happy to clarify.

Pseudotime is defined from one specific start cell and approximates the time a cell would need to differentiate into any other state.

So, for the typical workflow we recommend using a single stem cell that appears to be the most stem-like. For that we often take the cell that is at the extreme for a diffusion component that coincides with the stemness. E.g., in this dataset the stem cells are at the bottom and component 0 reaches a maximum there: image To, automatically select the extreme cell you could use

early_cell = palantir.utils.early_cell(ad, celltype="stem_cell", celltype_column="celltype")

Note that this assumes you annotated some cells as "stem_cell" in ad.obs["celltype"]. The subsequent Palantir call could be

palantir.core.run_palantir(ad, early_cell=early_cell)

However, if you really want to compute a psuedotime for multiple root cells, then you can do that too. To compute a separate pseudotime for each root cell you could run

root_cells = [...]
for cell in root_cells:
    palantir.core.run_palantir(
        ad,
        early_cell=cell,
        pseudo_time_key=f"cell_{cell}_psuedotime", 
        entropy_key=f"cell_{cell}_entropy",
        fate_prob_key=f"cell_{cell}_fate_probabilities",
    )

After that, you would have to decide how to define a pseudotime for multiple root cells. Is it the minimal pseudotemporal distance to a root cell? Then you could set it as

ad.obs["multi_root_pseudotime"] = ad.obs[[f"cell_{cell}_psuedotime" for cell in root_cells]].min(axis=1)

Or, if it is the average pseudotemporal distance to a root cell, then you could run

ad.obs["multi_root_pseudotime"] = ad.obs[[f"cell_{cell}_psuedotime" for cell in root_cells]].mean(axis=1)

Please let me know if I got this right or if you have any further questions!