Closed maud-p closed 1 week ago
Thank you @sjspielman for looking into it :)
I have added the renv.lock modified, sorry, I missed it in my git add
!
Regarding the for
loop I agree and will move it into the previous for loop. It was for me a way not to re-run the notebook 00-, 01- 02- and get quicker to the new results.
Hi @maud-p, thanks for filing this next PR!
I'm going to start having a careful look for review, but first there are two quick things I see off the bat that you can start working on if you want! For one, I left you a separate inline comment. Second, it doesn't look like the
renv.lock
file is up to date with additional packages used in this notebook. Can you please snapshot to update the lockfile? Thanks!
Hi @sjspielman , Thank you again for your review and advice! I just committed the changes, let me know what do you think!
For the first round of review here, I've looked things over for clarity and correctness. Overall it looks like it's in great shape!! After this first round, I'll do another round of review more focused on the science. Here are some comments in addition to the others I left inline:
- Can you update the
results/README.md
to include this notebook?
I haven't updated the results/README.md
for now because the notebook 03_clustering_exploration.Rmd
is not saving any output in results
. I am just generating a report in notebook
. So I updated the README.md
file in the analysis module. Should I create one for the notebook
directory?
The functions you added & docs about them look great, thanks for doing that! It really makes the notebook easier to read and work with :) Let's just do a bit more reorganization:
- Can you scoot up the "Functions" section (which looks great, by the way!) to be above "Analysis" but after "Introduction"?
- Can you order the functions in the same order that they are used in the actual notebook?
- I'm not sure the alluvial plots (while very cool!) are the easiest to read, because of how long the cell type labels are. Is it possible to make that font small enough to be able to read the labels clearly? If not (or alternatively), I wonder if a heatmap might be a clearer plot to make here that shows counts of cells in each combination of groups? There are more complicated statistics that one could show in a heatmap comparing these groupings, but for an exploratory notebook like this I think just counts are probably sufficient. One way to make this plot would be (vs using existing heatmap packages) to use
ggplot2::geom_rect()
. You can create new data frame that counts the combinations of cluster/annotation (dplyr::count()
can help for this!) and then plot cluster & annotation against each other, with a fill aesthetic of the actual counts. Let me know if this makes sense or how I can further explain!
I tried to go for both:
alluvial
plots switching for sankey
plot, which should be basically the same jsut with some space between categories. Unfortunatelly, SCpubr::
is not maintaining their nice do_SankeyPlot
function, so I copy/pasted some of their old source code. The aim of these two approach is to show that whatever method we choose for labelling the cells (full or kidney fetal reference), they seem to converge for the identification of endothelial and immune cells.
This is important as I like to use it as the next step for running inferCNV
.
Would this make sense?
thank you!!
@maud-p just a quick heads up that I'm out of the office now at the AACR Pediatric conference, so I will be back to review this and chat about inferCNV
next week. Have a good weekend in the meantime!
@maud-p just a quick heads up that I'm out of the office now at the AACR Pediatric conference, so I will be back to review this and chat about
inferCNV
next week. Have a good weekend in the meantime!
Hi @sjspielman , thanks for letting me know, hope you enjoy the conference! I'll also be on conference next week 16-18 September FYI :)
Dear @sjspielman , thank you very much for the review and detailed explanations! I'll work on it hopefully tomorrow or latest Thursday ;)
I agree adding more heatmaps/comparisons !
Regarding inferCNV/copyKAT
, you have a great point here. I was thinking using inferCNV as you suggested for copyKAT
, but might be better to do it in two steps then:
1) copyKAT to help annotating malignant versus normal and
2) inferCNV to confirm CNV from copyKAT in the malignant cells?
Thank you!!
I was thinking using inferCNV as you suggested for copyKAT, but might be better to do it in two steps then
Yes, I think this is probably the way to go - copyKAT
can (maybe!) help us identify tumor vs. normal, and inferCNV
can potentially be used to validate some of those calls. When we get there, we'll want to do this one sample at a time since results will probably be really different among samples!
Hi @sjspielman , I think I adressed your comments/suggestions :)
I looked for ~ 10 samples the comparisons of the 2 fetal references annotations, it seems to fit quite well for the endothelial and immune cells.
I like the fetal kidney reference the most, as the annotation are quite simple, but also detailed enough for our purpose.
My though would be to go with the fetal_kidney_predicted_compartment
and
copyKAT
using endothelial and immune as healthy cellsAnother option would be to use for copyKAT
cells that are annotated using both label transfer of fetal references as endothelal or immune. A bit more complex to write maybe, but might be the safest way to identify true endothelial and immune cells?
Let me know what do you think!
Thank you!!
Dear @sjspielman , I should have made the few changes and added the last html
notebooks :)
For some reasons, I got an error for one of the sample SCPCS000197
, I will have a closer look why and update you on this.
But I wanted to already share with you the notebooks. I had a look at few of the reports, and it seems that the different annotation strategies converge in the identification of endothelial and immune cells. I especially like the fetal kidney reference, fetal_kidney_predicted.compartment
, which also seems to perform quite well looking at the dotplots of marker genes.
FYI, I will be away from tomorrow until next Thursday, I'll be at the SIOP RTSG. I hope to hear & learn new relevant insights for Wilms tumor!
Thank you!
For some reasons, I got an error for one of the sample SCPCS000197, I will have a closer look why and update you on this.
I'll have a look at this sample and see if I can track down the problem.
But I wanted to already share with you the notebooks. I had a look at few of the reports, and it seems that the different annotation strategies converge in the identification of endothelial and immune cells. I especially like the fetal kidney reference, fetal_kidney_predicted.compartment, which also seems to perform quite well looking at the dotplots of marker genes.
Thanks again for sharing all of these! I'll look through them all and see if we come to the same conclusions.
It worked, thank you so much @sjspielman ! I added the last notebook 🎉
Dear @sjspielman ,
thank you very much, these all makes lot of sense. I will re-run the analysis and should upload the notebooks by Thursday (will be travelling tomorrow, not sure how I'll have access to our server).
I like the idea to look at all samples, I'll work on a notebook and start a new PR :)
Thank you!
Dear @sjspielman, thank you very much !!! I am working on the next PR, I'll get back to you soon!
Thank you very much for your help and effort to make it work!!!
Purpose/implementation Section
Please link to the GitHub issue that this pull request addresses.
this is the following work on PR #704 taking into account the changes in PR #737
What is the goal of this pull request?
The aim here is to explore the clustering and label transfer from the 2 fetal references for each sample.
Briefly describe the general approach you took to achieve this goal.
Here I started from the output of the notebook 02b_label-transfer_fetal_kidney_reference_Stewart.Rmd that contains:
SCTransform
,PCA
,UMAP
,and explored the results looking at:
I compared the labels obtained from SingleR, CellAssign and the label transfer from the two fetal references (PR #737).
If known, do you anticipate filing additional pull requests to complete this analysis module?
Yes! More than one. I think that from this analysis, I can find a way to annotate healthy cells such as "immune" and "endothelial cells". From here, I will be able to fill a new PR to include inferCNV and/or copyKAT to the template.
Results
The notebook template produce a notebook per sample in
notebook/{sample_id}
folder. I have now uploaded the notebooks for the 2 first samples. Once we have discussed the analysis, I'll run for the 40 samples and add the notebooks!What is the name of your results bucket on S3?
What types of results does your code produce (e.g., table, figure)?
notebook
What is your summary of the results?
Provide directions for reviewers
What do you think?
What are the software and computational requirements needed to be able to run the code in this PR?
I render the notebook from the 00_run_workflow.R script. I open a new loop on purpose in order not to run everythink from PR #737 again, but I guess in a final step all the notebook will be ran i the same loop!
Are there particularly areas you'd like reviewers to have a close look at?
Is there anything that you want to discuss further?
I like to have your opinion on the best way to go to select normal cells as input for inferCNV. I am quite satisfyied by the labels from the fetal kidney reference
fetal_kidney_predicted.compartment
divided into:I think that we can safely take the immune and endothelial cells as healthy reference and run inferCNV from here. Then, with the result of inferCNV, I hope to be able to further split the fetal kidney and stroma compartment into normal and cancer blastema, epithelial and stroma cells.
Author checklists
Check all those that apply. Note that you may find it easier to check off these items after the pull request is actually filed.
Analysis module and review
README.md
has been updated to reflect code changes in this pull request.Reproducibility checklist
Dockerfile
.environment.yml
file.renv.lock
file.