Divide&conquer for annotation

The goal is to divide the full atlas into subsets for detailed annotation. We decided to go for a hierarchical approach to achieve higher resolution on the immune cell compartment.

Steps

Make data subsets [notebook]
- Cluster full atlas at high resolution (res=1.5) and label each cluster with most abundant annotated cell type
- Group clusters into data "splits"
- Save log-norm data for each split (so pre-scaling, feature selection, ridge regression etc): saved as /nfs/team205/ed6/data/Fetal_immune/PAN.A01.v01.entire_data_normalised_log.wGut.batchCorrected_20210118.SUBSETNAME.h5ad
Preprocess + batch correct data subset [script]: output is saved as /nfs/team205/ed6/data/Fetal_immune/PAN.A01.v01.entire_data_normalised_log.wGut.batchCorrected_20210118.SUBSETNAME.batchCorrected.h5ad
Visualize results and split more if necessary: see notebooks in notebooks/PFI_subset_EDA

What are the splits See slides illustrating splitting and output

Outstanding problems

Even when subsetting based on clustering there is "spill-over" between splits e.g. in the B cell split I still have some NK/T cells that then cluster separately post-integration. Is it ok to remove these cells from a subset post-hoc?

@suochenqu @Issacgoh let me know what you think of the results

Teichlab / Pan_fetal_immune

Divide&conquer for annotation #6