Wilms Tumor Dataset Annotation (SCPCP000006) _ clustering

If you are filing this issue based on a specific GitHub Discussion, please link to the relevant Discussion.

https://github.com/AlexsLemonade/OpenScPCA-analysis/discussions/635

Describe the goals of the changes to the analysis module.

The main addition to the module is a RMardown report for three Wilms tumor sample from the dataset SCPCP000006:

SCPCS000168,
SCPCS000169,
SCPCS000170.

We tested in this report ways to

cluster cells
characterize/identify each cell types

The aim would be to discuss the report and improvement before adapting and rendering it to all samples in the dataset.

The analysis will be as the following:

[0] We build a seurat object based on the counts data and went through the seurat workflow [normalization --> reduction --> clustering] following the Seurat workflow.

[1] We perform some quality check to assess any QC-induced clustering (nFeature, nCount, percent.mito).

[2] We add cell cycle information, as we know that in a specific cell cycle state, the transcriptional program is mostly/exclusively related to cell cycle genes and the identity of cells is difficult to determine. We expect these cells to cluster together in a cluster of proliferating cells.

[3] We look at specific marker genes that we reported in the table marker.sets/CellType_metadata.csv to check the relevance of the clustering.

[4] We look at specific pathways that we reported in the table marker.sets/Pathways_metadata.csv to check the relevance of the clustering.

[5] We run DElegate::FindAllMarkers2 to find markers of the different clusters and manually check if they do make sense. DElegate::FindAllMarkers2 is an improved version of Seurat::FindAllMarkers based on pseudobulk differential expression method.

[6] We perform enrichment analysis of marker genes for each seurat clusters. We defined all the genes from the seurat object as the universe and used the MSigDB gene sets.

[7] We plot pca/umap reduction grouping with available annotations (singler, cellassign). We expect at least immune cells to be correctly label and fall into a few set of clusters.

[8] We run label transfer (Azimuth) to transfer annotation from the fetal kidney atlas human reference. We plot pca/umap reduction grouping with latest labels. We expect it to be the most representative of the cell types in the sample.

What will your pull request contain?

The pull request will contain the rmd file 01-clustering_SCPCS000xxx.Rmd in the cell-type-wilms-tumor-06 folder and the html report in the notebook folder.

The renv.lock file needed to update-build-run the docker container required for the analysis.

Updates in the marker-sets folder:

add a table summarizing some pathways that can be usefull for identification and characterization of Wilms tumor cells.
an Azimuth_Compatible_Fetal_full folder containing the idx.annoy and ref.Rds file require to build the reference dataset for label transfer.

Will you require additional software beyond what is already in the analysis module?

We continue working with RStudio and try to keep the dockerfile updated with additional packages.

Will you require different computational resources beyond what the analysis module already uses?

I work on my machine. Not sure how to answer that question, but here is a screenshot of the memory usage report of my r session, in case it can help.

If known, when do you expect to file the pull request?

~07/08/2024

AlexsLemonade / OpenScPCA-analysis