Wilms Tumor Dataset Annotation (SCPCP000006) #635

maud-p commented 4 months ago

Please link to the GitHub Discussion for this proposed analysis.

https://github.com/AlexsLemonade/OpenScPCA-analysis/discussions/635#discussioncomment-10140478

Describe the goals of this analysis module.

Here, we first aim to annotate the Wilms Tumor snRNA-seq samples in the SCPCP000006 (n=40) dataset. To do so we will: • Provide annotations of normal cells composing the kidney, including normal kidney epithelium, endothelium, stroma and immune cells • Provide annotations of tumor cell populations that may be present in the WT samples, including blastemal, epithelial, and stromal populations of cancer cells Based on the provided annotation, we would like to additionally provide a reference of marker genes for the three cancer cell populations, which is so far lacking for the WT community.

The analysis will be divided as the following:

Metadata file: compilation of a metadata file of marker genes for expected cell types that will be used for validation at a later step
Script: clustering of cells across a set of parameters for few samples
Script: label transfer from the fetal kidney atlas reference using runAzimuth
Script: run InferCNV
Notebook: explore results from steps 2 to 4 for about 5 to 10 samples
Script: compile scripts 2 to 4 in a RMardown file with required adjustements and render it across all samples
Notebook: explore results from step 6, integrate all samples together and annotate the dataset using (i) metadatafile, (ii) CNV information, (iii) label transfer information

What software will you require?

I will use RStudio build with a Docker image from the base image rocker/tidyverse:4.3.0 BiocManager version = "3.17"

main packages used are:

Seurat version 5 Azimuth version 5 inferCNV SCpubr for visualization DT for table visualization DElegate for differential expression analysis

What will your first pull request contain?

the first pull request will be a metadata file containing a list of marker genes for expected cell types The table will contain the following column:

gene symbol gene ENSEMBL id cell type specificity reference: DOI id of related publication

What computational resources will you require?

I will use our own machine and computational resources.

If known, when do you expect to file the first pull request?

~01/08/2024

jashapiro commented 4 months ago

Thank your for filing this issue with your plans!

I will use RStudio build with a Docker image from the base image rocker/tidyverse:4.3.0 BiocManager version = "3.17"

For best compatibility with the other packages currently in use, you might consider using Bioconductor 3.19 and R 4.4. We use these in part because of a known security vulnerability in R <4.4.

For easiest implementation that saves on some installation time, you might consider using the bioconductor/tidyverse:3.19 image for your development.

maud-p commented 4 months ago

Good to know, thank you very much! I'll build a docker image based on bioconductor/tidyverse:3.19 then before starting the module-2!

AlexsLemonade / OpenScPCA-analysis