LungCellAtlas / HLCA

MIT License
48 stars 5 forks source link

Lung cancer data #7

Closed SirKuikka closed 1 year ago

SirKuikka commented 1 year ago

Hi,

It it a good idea to use scArches (https://github.com/theislab/scarches/blob/hlca_tutorial_improvements/notebooks/hlca_map_classify.ipynb) to map the HLCA data to lung cancer data? The cancer cells are probably quite different from the healthy epithelial cells. Could this analysis reveal something interesting about the query (cancer) data? Or was this scArches application mostly designed for healthy lung scRNA-seq data?

LisaSikkema commented 1 year ago

Hi @SirKuikka , as you can read in the paper we mapped lots of disease data to the healthy HLCA core, so it should work. The quality of the mapping mostly depends on how different your data is in terms of technologies used etc. compared to the HLCA core (which is single cell 10X), for example single nucleus might not give as nice a mapping as single cell.

Ideally you also have some healthy controls in your dataset, so that you can use those to check if the mapping went well (i.e. if controls from your data are mixing well with the HLCA core (controls only)). But I'd say just give it a try, and if the mapping works well it could indeed tell you something about which cell types are different in your disease of interest (lung cancer in this case) compared to healthy tissue. You can check out the paper for examples of that, we do it with lung cancer, IPF and more. All notebooks with analysis are available on the HLCA reproducibility GitHub

SirKuikka commented 1 year ago

Hi @LisaSikkema

I have only lung cancer cells, and the cells formed a cluster that is distinct from the reference cells when I used scArches to map the cells to the HLCA reference. The cells were close to epithelial cells, which makes sense.

I guess it's a nice visualization to show that our cells are different from normal lung cells.

Besides that, I have to think what else the latent features could tell about my data. That would these latent features be in some sense better than the ones get from e.g. Seurat's PCA. Do you think that there would be some reason to use the scArches latents instead of principal components?

LisaSikkema commented 1 year ago

That sounds exactly like what we found when mapping lung cancer to the HLCA, just check out fig. 4c and Extended Data figure 6 of our paper, plus the text with the figures.

The advantages of using a reference instead of just your own data are multiple:

If you're interested in more details I would refer you to the paper, it discusses your question extensively!

LisaSikkema commented 1 year ago

@SirKuikka I will close this issue now but feel free to re-open if you have further comments