LungCellAtlas / HLCA

MIT License
48 stars 5 forks source link

The Human Lung Cell Atlas

Welcome to the github page of the Human Lung Cell Atlas (HLCA). Here, you will find explanation about the HLCA, and links to all places you can find, download, explore, and use the HLCA

What is the HLCA?

If you're still wondering this, maybe check out the paper. In brief, it is the first integrated, universal transcriptomic reference of the human lung at the single-cell level.

Why do we need the HLCA?

Over the past decade, numerous single-cell studies of the human lung have been published, yet each of these studies was limited in the number and diversity of individuals, and are biased by their specific choice in technologies, protocols and more. A comprehensive reference should capture variation across a diverse population. Moreover, querying individual studies simultaneously is complicated by different cell type definitions. The HLCA overcomes these challenges by bringing together single-cell and single-nucleus studies into a single atlas, combining samples from 486 individuals across 49 datasets. The core of this atlas, comprising healthy lung samples from 107 individuals, was fully re-annotated based on original annotations and annotations by 6 independent lung experts. This reannotation resulted in labeling of 61 different cell identity labels, thus proposing a first consensus annotation of the human lung.


Figure 1. The cell annotations of the HLCA, split by cell type compartment.

What can we do with the HLCA?

The unprecedented number and diversity of human lung samples and cell types in the HLCA can be leveraged for a number of purposes. In the HLCA publication, we show that pooling these datasets enables better annotation of rare cell types. We moreover leverage the diversity in demographics of the atlas to model natural variation among healthy individuals, modeling the effects of sex, age, BMI, smoking, as well as changes with location along the respiratory tract. Using the HLCA cell type annotations, we link genomic variants of disease to specific cell types in the lung. Finally, we show that mapping new data to the HLCA core enables fast and accurate cell type annotation, as well as the identification of unknown cell identities, and disease-affected cell types. Importantly, you can easily map your own data to the HLCA, as is described below under "map your own data to the HLCA".


Figure 2. Overview of the HLCA study.

How to use, explore and download the HLCA

If you would like to take a look at the HLCA, you can interactively explore it on CELLxGENE. There you can also download the HLCA for your own use:

The file HLCA_metadata_explanation.csv in the docs folder of this repo contains a description of each metadata category that you'll find in the HLCA.

If you're interested in the code we used for the HLCA project, go check out the HLCA reproducibility GitHub containing all scripts and notebooks used for the HLCA project.

If you would like to map your own data to the HLCA, see the section below.

Map your own data to the HLCA and/or perform HLCA-based label transfer:

If you would like to map your own data to the HLCA to obtain an atlas-based low-dimensional embedding (compatible with the atlas), and for label transfer and identification of unknown and disease affected cell types there are multiple places to do that:

The HLCA scANVI-based reference model can be found on Zenodo

For CellTypist-based label transfer (not used in the paper):

Have fun!

The HLCA paper:

Sikkema et al., Nature Medicine 2023

Any questions?

Please submit an issue to this GitHub repository.