Closed maud-p closed 2 months ago
Hi @maud-p, thanks for filing this updated PR! I wanted to let you know that I'm going over your code now, and you can expect a review from me tomorrow.
I want to also note that this comment https://github.com/AlexsLemonade/OpenScPCA-analysis/pull/706#discussion_r1711482392 @jaclyn-taroni's review in your other PR could very much also apply to this notebook too. Analysis parameter choices, which you may decide to vary at some future point, are a good type of variable to make into an R Markdown parameter.
Thank you very much @sjspielman for taking the time to give such a great feedback!
I'll take some more time to read all your documentation point by point but globally from a first read it is all clear and make lot of sense!
To answer your question about the sensitivity of DElegate to the number of clusters, I think it is! Naively, I used it here to have a rapid idea of the quality of the clustering: if it finds specific genes for each cluster, I expect each of them to have at least a different phenotype. In case it struggles finding specific genes, we might have over-clustered. It would also allow a quick check if our "favorite" known marker genes pop up in the top 5, like CD45.
As you wrote, I would not use it for annotation. The two uses of it I can see would be (i) at the very beginning to get an idea of the dataset, and (ii) at the very end of the analysis module to find marker genes of the annotated cell types. Would that makes sense?
For plannification, I'd like to first fix the PR #706 regarding the Azimuth/Seurat reference for label transfer and then focus on this PR.
I'll try to provide you feedback hopefully by the end of next week!
Thank you again!
As you wrote, I would not use it for annotation. The two uses of it I can see would be (i) at the very beginning to get an idea of the dataset, and (ii) at the very end of the analysis module to find marker genes of the annotated cell types. Would that makes sense?
Yes, this definitely makes sense! I would actually recommend adding a short "preamble" like this to the introduction section of the notebook too - even just having these as bullet points to state that these are the main goals of the notebook would be a great addition to contextualize the full analysis.
For plannification, I'd like to first fix the PR https://github.com/AlexsLemonade/OpenScPCA-analysis/pull/706 regarding the Azimuth/Seurat reference for label transfer and then focus on this PR.
Sounds good! Just an FYI - I will be slow to respond next week (8/19-8/23) since the Data Lab will be traveling to teach a workshop.
Thank you very much again @sjspielman for your detailed review and all comments!
I tried to improve the script and RMarkdown following your advice. Main points:
for-loop
and moved the rest to the RMarkdown following the template.params
in the yaml
section of the RMarkdownfunctions
for repetitive codes. /script
folder, the RMarkdown template in the template
folder and the notebooks in the notebook
folder. I added a short READ.ME
file in the `template folder.FYI, few more changes are planed with PR #706. Not sure what is best to do first, so I just decided to inform you about the few changes I have made to this PR!
Thank you again,
FYI, few more changes are planed with PR https://github.com/AlexsLemonade/OpenScPCA-analysis/pull/706. Not sure what is best to do first, so I just decided to inform you about the few changes I have made to this PR!
I was just checking in with some of the changes and @jaclyn-taroni's feedback in #706 when you posted this comment, so good timing 😄 ! I'll have a look at the changes in this PR today, but I do have a sense that some of the functions you are going to write as part of #706 will impact this code, too.
I think a first step towards holistically approaching both PRs would be to first "inventory" the code and figure out which functions you are going to write (e.g., function(s) for common Seurat steps). This will help us understand which functions being written in #706 should also be used here. This leads into some logistics we'll want to keep track of to make sure you don't double your work, and help you avoid conflicts in your code! Specifically, if there are functions which are going to be used in both PRs, then you'll want to tackle one PR fully at a time.
Because I anticipate you'll be writing functions as part of #706 that will also be used in this notebook, I'd suggest first working to wrap up #706. Once that is merged into main
, we can update this branch (01-clustering-template
) to sync changes, allowing us to use code in #706 in this PR directly (see our documentation on syncing - this can be tricky so please let me know how I can help with this more directly when we get there!). In the meantime, I'll leave feedback today on the code changes you've made so far, but I would probably recommend focusing on #706 before coming back to this PR.
Great thank you! I agree to first focus on PR #706 and then coming back to this one 😄
Hi @sjspielman , I just wanted to let you know that I will continue working on the cluster exploration now 😄 I'll work on a branch from https://github.com/AlexsLemonade/OpenScPCA-analysis/pull/737 once it will be merged and open a new PR once I have the script updated! Hopefully early next week 😄
I think we can close this PR then.
Hi @maud-p, if you think a new PR makes more sense given the other module changes you've been making, all good with me! Let's leave this PR open until a new one is filed, just in case.
Either way, I'm not sure if you've had a chance to encounter this yet, but you might find cherry-picking git commits useful as you handle logistics here. Cherry-picking allows you to literally duplicate a commit between branches. For example, if there are commits in this branch you want to include in a different branch to ultimately file a new PR, you can use cherry-picking to copy them over. Here's a tutorial on using it with GitKraken & via command line! https://www.gitkraken.com/learn/git/cherry-pick
Feel free to let me know (on Github or Slack!) if I can help with any git things along the way here :)
Thank you very much @sjspielman , sounds very useful and I didn't know about it - actually you can consider I don't know anything about git
, learning right now :)
actually you can consider I don't know anything about git, learning right now :)
You're doing great!! It's a lot to learn..always a journey!
Purpose/implementation Section
Please link to the GitHub issue that this pull request addresses.
https://github.com/AlexsLemonade/OpenScPCA-analysis/discussions/635
What is the goal of this pull request?
As discussed in the previous PR (https://github.com/AlexsLemonade/OpenScPCA-analysis/pull/699) I wanted to have one notebook that is rendered over the entire Wilms tumor dataset.
Briefly describe the general approach you took to achieve this goal.
If known, do you anticipate filing additional pull requests to complete this analysis module?
Yes! More than one. At least:
Results
What types of results does your code produce (e.g., table, figure)?
HTML reports
What is your summary of the results?
Provide directions for reviewers
In this section, tell reviewers what kind of feedback you are looking for. This information will help guide their review.
What are the software and computational requirements needed to be able to run the code in this PR?
This information will help reviewers run the code during review, if applicable. For software, how should reviewers set up their environment (e.g.,
renv
orconda
) to run this code? For compute, can reviewers run this code on their laptop, or do they need additional computational resources such as RAM or storage? Please make sure this information, if applicable, is documented in the README.md file.Are there particularly areas you'd like reviewers to have a close look at?
Is there anything that you want to discuss further?
If you have any comments, ideas, very welcome. Either on the content of the template (as I said, I think some sections will not be so usefull) or the future plans:
Author checklists
Analysis module and review
README.md
has been updated to reflect code changes in this pull request.Reproducibility checklist
Dockerfile
.environment.yml
file.renv.lock
file.