Teichlab / cellhint

A tool for semi-automatic cell type harmonization and integration
MIT License
86 stars 10 forks source link

Kind questions about impact of the pre-integration annotation on cellhint integration #2

Open NPTL1201 opened 7 months ago

NPTL1201 commented 7 months ago

Dear Developers,

Thank you very much for the tools. I understand that pre-integration annotation is essential for CellHint, and CellTypist could be a very convenient tool for annotation. I noticed a statement in the integration tutorial: “Of note, influence of cell annotation on the data structure can range from forcibly merging the same cell types to a more lenient cell grouping.” My question is mainly about this statement.

Suppose there is a dataset A with a novel and previously undescribed cell subset (i.e., it does not have any corresponding reference based on the current built-in model or my own model). Another dataset B also has a distinct novel cell subset that has not been described before. In such a case, when using CellTypist to annotate them, they may be assigned a cell type with the highest confidence score or labeled as “unassigned”.

Let's say if these two distinct novel cell types are labeled as the same cell type by CellTypist, would the integration forcibly merge them together, considering the aforementioned statement? In other words, they are distinct, but they are eventually merged together just because they share the same CellTypist label. Can I see this as a disadvantage of the supervised way to integrate data?

If such a situation really exists, how can we best avoid it? Could manual annotation be the gold standard for CellHint?"

Many thanks in advances

Anlin

ChuanXu1 commented 7 months ago

@NPTL1201, for undetermined cell types, you can assign them as "UNASSIGNED" to make partial annotations. CellHint will incorporates these cells into every cell type group and thus expands the size of each group. Accordingly, for these "UNASSIGNED" cells, their search space will be among all cell type groups available. Please check here (Usage (integration) -> 2. Tips for data integration -> 2.1. Partial annotation). Hope it's clear.