LungCellAtlas / HLCA_reproducibility

This repository contains all code used for the Human Lung Cell Atlas project.
MIT License
41 stars 12 forks source link

Hard to understand the code #10

Closed mugpeng closed 10 months ago

mugpeng commented 11 months ago

I understand the concern and some inconveniences for sharing the raw codes like mentioned in #9. But it is really hard to get the point without files.

For example, I am confused about the relationship between manual annotations to leveled annotations: image ps: I am clear that you use 3 levels in the paper, and what the function of the manual annotations?

besides, some plot outputs are also missed: image

Thank you.

LisaSikkema commented 11 months ago

Hi @mugpeng ,

All the code was actually shared, it is only the raw and intermediate data files (which together probably make up 100s of Gb) that I didn't upload, but again: happy to share if you want certain files.

As for the table you import above, this is part of the GitHub repo, see here: https://github.com/LungCellAtlas/HLCA_reproducibility/blob/main/supporting_files/celltype_structure_and_colors/manual_anns_and_leveled_anns_ordered.csv

The "manual annotations" are the reannotations we did with help of 6 experts. These are broken down into at maximum 5 levels, as also explained in the paper. For some cell types, we have only two levels as these simply did not have annotations in more detail (e.g. level 2 "Mesothelium", which is a subset of level 1 "Stroma" but not broken down further). For others we go all the way to level 5, e.g. Multiciliated nasal.

The "harmonizing df" (data frame) is related to the original annotations from the individual datasets: each dataset in the HLCA core included original annotations, and we mapped these to a common hierarchical cell type reference so that we could fairly compare them (e.g. mapping "blood vessel cells" and "endothelial cells" to the same cell type). The table with information about how we did this harmonisation can also be found on the GitHub repo: https://github.com/LungCellAtlas/HLCA_reproducibility/blob/main/supporting_files/metadata_harmonization/HLCA_cell_type_reference_mapping_20221007.csv In general, the folder "supporting files" likely has most of the tables that could be of interest to you.

mugpeng commented 10 months ago

Thank you for your reply. Sorry for missing that files. I got it.

LisaSikkema commented 10 months ago

No problem at all, I know there are lots of folders and files to navigate in this repo