AlexsLemonade / OpenScPCA-analysis

An open, collaborative project to analyze data from the Single-cell Pediatric Cancer Atlas (ScPCA) Portal
Other
9 stars 17 forks source link

01_clustering #680

Closed maud-p closed 3 months ago

maud-p commented 3 months ago

Purpose/implementation Section

Hi DataLab team,

I apologies, I messed something up... I created a new branch maud-p-01-clustering that I wanted to be associated with new changes in this pull request. But for some reason, the changes are in the main branch (https://github.com/AlexsLemonade/OpenScPCA-analysis/pull/672/commits)... And I don't know how to open a new pull request in the main branch as there is already one open... I am sorry for that, If you can let me know how to do it best next time would be great, thank you!

For now, in order to show a bit of progress, I described below the changes I did in my module cell-type-wilms-tumor-06 and you can track the commit here (https://github.com/AlexsLemonade/OpenScPCA-analysis/pull/672/commits)._

Please link to the GitHub issue that this pull request addresses.

https://github.com/AlexsLemonade/OpenScPCA-analysis/issues/679 https://github.com/AlexsLemonade/OpenScPCA-analysis/discussions/635

What is the goal of this pull request?

The main addition to the module is a RMardown report for one Wilms tumor sample (dataset SCPCP000006, sample SCPCS000169). The aim would be to discuss the report and improvement before adapting and rendering it to all samples in the dataset.

What does this pull request contain?

The pull request contains the rmd file 01-clustering_SCPCS000169.Rmd in the cell-type-wilms-tumor-06 folder and the html report in the notebook folder.

The dockerfile required to build the docker image and start RStudio has been updated with packages required.

We added clinical data from the dataset SCPCP000006_metadata.tsv to better track and understand the sample if needed.

Briefly describe the general approach you took to achieve this goal.

The analysis has been performed as the following:

[0] We build a seurat object based on the counts data and went through the seurat workflow [normalization --> reduction --> clustering] following the Seurat workflow.

[1] We performed some quality check to assess any QC-induced clustering (nFeature, nCount, percent.mito).

[2] We added cell cycle information, as we know that in a specific cell cycle state, the transcriptional program is mostly/exclusively related to cell cycle genes and the identity of cells is difficult to determine. We expect these cells to cluster together in a cluster of proliferating cells.

[3] We ran DElegate::FindAllMarkers2 to find markers of the different clusters and manually check if they do make sense. DElegate::FindAllMarkers2 is an improved version of Seurat::FindAllMarkers based on pseudobulk differential expression method.

[4] We looked at specific marker genes that we reported in the table marker.sets/CellType_metadata.csv to check the relevance of the clustering.

[5] We plot pca/umap reduction grouping with available annotations from the DataLab (singler, cellassign). We expect at least immune cells to be correctly label and fall into a few set of clusters.

[6] We ran label transfer (Azimuth) to transfer annotation from the fetal kidney atlas human reference. We plot pca/umap reduction grouping with latest labels. We expect it to be the nost representative of the cell types in the sample.

If known, do you anticipate filing additional pull requests to complete this analysis module?

[ ] After discussion with you, I will adapt and render the script to render it to all 40 samples of the dataset.

[ ] We will save for each sample the rds file

[ ] We will run inferCNV for each sample to decide the malignant/normal status of some stroma and epithelial cluster, and confirm the blastema annotation.

The next step will provide us a better understanding of the entire cohort. We will then have to set up a strategy to annotate each sample. Open questions are:

[ ] should we annotate single cell or [] consider applying similar annotations to all cells in a cluster?

[ ] manual annotation of each cluster / each patient or [] automated annotation using some threshold?

Results

What is the name of your results bucket on S3?

I haven't uploaded anything on S3 yet.

What types of results does your code produce (e.g., table, figure)?

html report/notebook.

What is your summary of the results?

The primary aim of this script was to assess the quality of clustering and play with clustering parameters. On this sample, we are happy with the clustering, as we could identify:

See selected plots to summarize the analysis.

image image

Additionally, we showed the feasibility to run label transfer using runAzimuth (part of step 3 of the proposed analysis: https://github.com/maud-p/OpenScPCA-analysis/issues/1).

We also showed the easy and robust identification of normal cells (endothelial and immune cells) that would serve as reference for the CNV inference (inferCNV, step 4 of the analysis).

Provide directions for reviewers

I would like to have your opinion on the report before rendering it to all WT samples. [ ] Is the analysis worflow OK? Can we pursue this strategy [ ] Suggestion for improvment.

What are the software and computational requirements needed to be able to run the code in this PR?

Copy/pasted from the READ.ME file in OpenScPCA-analysis/analyses/cell-type-wilms-tumor-06/

To perform the analysis, run the RMarkdown script in R (version 4.4.1). The main packages used are:

Seurat version 5 Azimuth version 5 inferCNV SCpubr for visualization DT for table visualization DElegate for differential expression analysis For complete reproducibility of the results, you can build and run the docker image using the Dockerfile. This will allow you to work on RStudio (R version 4.4.1) from the based image bioconductor/tidyverse:3.19.

In the config.yaml file, define your system specific parameter and paths (e.g. to the data). Execute the run.sh file and open RStudio in your browser (http://localhost:8080/). By default, username = rstudio, password = wordpass.

Are there particularly areas you'd like reviewers to have a close look at?

Is there anything that you want to discuss further?

Author checklists

Analysis module and review

Reproducibility checklist

jaclyn-taroni commented 3 months ago

Hi @maud-p,

Thanks for filing this!

Regarding your comment, I will take a look at both pull requests and see if I have a recommendation to isolate the changes relevant to the individual pull requests this morning (Eastern US).

If not, it's no problem. I'll definitely post advice for future development either way!