SCPCP000006 - Add 01-clustering template analysis

maud-p commented 3 months ago

Purpose/implementation Section

Please link to the GitHub issue that this pull request addresses.

https://github.com/AlexsLemonade/OpenScPCA-analysis/discussions/635

What is the goal of this pull request?

As discussed in the previous PR (https://github.com/AlexsLemonade/OpenScPCA-analysis/pull/699) I wanted to have one notebook that is rendered over the entire Wilms tumor dataset.

Briefly describe the general approach you took to achieve this goal.

I basically took the previous .Rmd file,
remove all the comments specific to one sample,
remove the label transfer using the fetal kidney atlas (for the next PR)
write a R script to render the notebook on all samples.

If known, do you anticipate filing additional pull requests to complete this analysis module?

Yes! More than one. At least:

[ ] A PR to build the fetal kidney reference for Azimuth,
[ ] A PR to include the label transfer from the fetal kidney reference to the clustering template.
[ ] A PR to include inferCNV and/or copyKAT to the template.
[ ] A PR to subdivide and re-cluster stroma and fetal nephron cells, taht contain nromal + cancer cells

Results

What types of results does your code produce (e.g., table, figure)?

HTML reports

What is your summary of the results?

Provide directions for reviewers

In this section, tell reviewers what kind of feedback you are looking for. This information will help guide their review.

What are the software and computational requirements needed to be able to run the code in this PR?

This information will help reviewers run the code during review, if applicable. For software, how should reviewers set up their environment (e.g., renv or conda) to run this code? For compute, can reviewers run this code on their laptop, or do they need additional computational resources such as RAM or storage? Please make sure this information, if applicable, is documented in the README.md file.

Are there particularly areas you'd like reviewers to have a close look at?

Is there anything that you want to discuss further?

If you have any comments, ideas, very welcome. Either on the content of the template (as I said, I think some sections will not be so usefull) or the future plans:

[ ] build the fetal kidney reference for Azimuth,
[ ] include the label transfer from the fetal kidney reference to the clustering template.
[ ] include inferCNV and/or copyKAT to the template.
[ ] subdivide and re-cluster stroma and fetal nephron cells, taht contain nromal + cancer cells

Author checklists

Analysis module and review

[x] This analysis module uses the analysis template and has the expected directory structure.
[x] The analysis module README.md has been updated to reflect code changes in this pull request.
[x] The analytical code is documented and contains comments.
[ ] Any results and/or plots this code produces have been added to your S3 bucket for review.

Reproducibility checklist

[ ] Code in this pull request has been added to the GitHub Action workflow that runs this module.
[x] The dependencies required to run the code in this pull request have been added to the analysis module Dockerfile.
[ ] If applicable, the dependencies required to run the code in this pull request have been added to the analysis module conda environment.yml file.
[x] If applicable, R package dependencies required to run the code in this pull request have been added to the analysis module renv.lock file.

sjspielman commented 3 months ago

Hi @maud-p, thanks for filing this updated PR! I wanted to let you know that I'm going over your code now, and you can expect a review from me tomorrow.

sjspielman commented 3 months ago

I want to also note that this comment https://github.com/AlexsLemonade/OpenScPCA-analysis/pull/706#discussion_r1711482392 @jaclyn-taroni's review in your other PR could very much also apply to this notebook too. Analysis parameter choices, which you may decide to vary at some future point, are a good type of variable to make into an R Markdown parameter.

maud-p commented 3 months ago

Thank you very much @sjspielman for taking the time to give such a great feedback!

I'll take some more time to read all your documentation point by point but globally from a first read it is all clear and make lot of sense!

To answer your question about the sensitivity of DElegate to the number of clusters, I think it is! Naively, I used it here to have a rapid idea of the quality of the clustering: if it finds specific genes for each cluster, I expect each of them to have at least a different phenotype. In case it struggles finding specific genes, we might have over-clustered. It would also allow a quick check if our "favorite" known marker genes pop up in the top 5, like CD45.

As you wrote, I would not use it for annotation. The two uses of it I can see would be (i) at the very beginning to get an idea of the dataset, and (ii) at the very end of the analysis module to find marker genes of the annotated cell types. Would that makes sense?

For plannification, I'd like to first fix the PR #706 regarding the Azimuth/Seurat reference for label transfer and then focus on this PR.

I'll try to provide you feedback hopefully by the end of next week!

Thank you again!

sjspielman commented 3 months ago

As you wrote, I would not use it for annotation. The two uses of it I can see would be (i) at the very beginning to get an idea of the dataset, and (ii) at the very end of the analysis module to find marker genes of the annotated cell types. Would that makes sense?

Yes, this definitely makes sense! I would actually recommend adding a short "preamble" like this to the introduction section of the notebook too - even just having these as bullet points to state that these are the main goals of the notebook would be a great addition to contextualize the full analysis.

For plannification, I'd like to first fix the PR https://github.com/AlexsLemonade/OpenScPCA-analysis/pull/706 regarding the Azimuth/Seurat reference for label transfer and then focus on this PR.

Sounds good! Just an FYI - I will be slow to respond next week (8/19-8/23) since the Data Lab will be traveling to teach a workshop.

maud-p commented 3 months ago

Thank you very much again @sjspielman for your detailed review and all comments!

I tried to improve the script and RMarkdown following your advice. Main points:

I shortened the R script to the for-loop and moved the rest to the RMarkdown following the template.
I used the params in the yaml section of the RMarkdown
I defined functions for repetitive codes.
I placed the R script in /script folder, the RMarkdown template in the template folder and the notebooks in the notebook folder. I added a short READ.ME file in the `template folder.

FYI, few more changes are planed with PR #706. Not sure what is best to do first, so I just decided to inform you about the few changes I have made to this PR!

Thank you again,

sjspielman commented 3 months ago

FYI, few more changes are planed with PR https://github.com/AlexsLemonade/OpenScPCA-analysis/pull/706. Not sure what is best to do first, so I just decided to inform you about the few changes I have made to this PR!

I was just checking in with some of the changes and @jaclyn-taroni's feedback in #706 when you posted this comment, so good timing 😄 ! I'll have a look at the changes in this PR today, but I do have a sense that some of the functions you are going to write as part of #706 will impact this code, too.

I think a first step towards holistically approaching both PRs would be to first "inventory" the code and figure out which functions you are going to write (e.g., function(s) for common Seurat steps). This will help us understand which functions being written in #706 should also be used here. This leads into some logistics we'll want to keep track of to make sure you don't double your work, and help you avoid conflicts in your code! Specifically, if there are functions which are going to be used in both PRs, then you'll want to tackle one PR fully at a time.

Because I anticipate you'll be writing functions as part of #706 that will also be used in this notebook, I'd suggest first working to wrap up #706. Once that is merged into main, we can update this branch (01-clustering-template) to sync changes, allowing us to use code in #706 in this PR directly (see our documentation on syncing - this can be tricky so please let me know how I can help with this more directly when we get there!). In the meantime, I'll leave feedback today on the code changes you've made so far, but I would probably recommend focusing on #706 before coming back to this PR.

maud-p commented 3 months ago

Great thank you! I agree to first focus on PR #706 and then coming back to this one 😄

maud-p commented 2 months ago

Hi @sjspielman , I just wanted to let you know that I will continue working on the cluster exploration now 😄 I'll work on a branch from https://github.com/AlexsLemonade/OpenScPCA-analysis/pull/737 once it will be merged and open a new PR once I have the script updated! Hopefully early next week 😄

maud-p commented 2 months ago

I think we can close this PR then.

sjspielman commented 2 months ago

Hi @maud-p, if you think a new PR makes more sense given the other module changes you've been making, all good with me! Let's leave this PR open until a new one is filed, just in case.

Either way, I'm not sure if you've had a chance to encounter this yet, but you might find cherry-picking git commits useful as you handle logistics here. Cherry-picking allows you to literally duplicate a commit between branches. For example, if there are commits in this branch you want to include in a different branch to ultimately file a new PR, you can use cherry-picking to copy them over. Here's a tutorial on using it with GitKraken & via command line! https://www.gitkraken.com/learn/git/cherry-pick

Feel free to let me know (on Github or Slack!) if I can help with any git things along the way here :)

maud-p commented 2 months ago

Thank you very much @sjspielman , sounds very useful and I didn't know about it - actually you can consider I don't know anything about git, learning right now :)

sjspielman commented 2 months ago

actually you can consider I don't know anything about git, learning right now :)

You're doing great!! It's a lot to learn..always a journey!

AlexsLemonade / OpenScPCA-analysis