Closed maud-p closed 1 month ago
Hi @sjspielman , Thank you for the rapid feedback 🏎️ !
I just reloaded the results to the s3 bucket, might have been a bug during the transfer to s3 because my local results
directory does contain the 3 rds
objects per samples.
I'll plot the different tables and come back to you, thanks for the few lines of codes to extract the metadata :)
@sjspielman I modified the notebook based on your suggestion, it is quite simple now but let me know if you thing to additional plots and checks that can be useful :)
Thank you again for your help!
Dear @sjspielman , thank you so much for the review and the suggestions, including codes! Was really useful, as usual!
I should have made all suggested changes :D
One additional table that I would like to discuss is the sumary, for each compartment
, of the percentage of cells that do match kidney annotation or not.
The majority of fetal nephron cell (92%) has been predicted as kidney. However, the other compartments
(stroma, endothelial and immune) do not really match to kidney cells. This is in my opinion not a concern and shouldn't be interpreted as a poor label transfer or annotation! I'll try to argue a bit why.
Thank you again for your help!
Ah, one more thing I forgot!
We should add rendering this notebook to the 00_run_workflow.R
script. This step should be after/outside the for loop (since we only knit this once, not for each sample), and only run if we are not testing (since the input files for this notebook aren't generated in testing).
I ended up selecting the following samples:
I tried to enriched in samples having >100 normal cells, without decreasing too much the prediction of kidney cells. Would it be OK like this?
Thank you again for your help!!
Great, thank you very much!
Thank you very much for letting me know about your days off. Then I might take the time to compare few different methods (copyKAT +/- reference, inferCNV +/- reference) on the few samples!
For the workflow, did I understood that it would be best to have:
Should I try in another PR to split the first 3 notebooks into scripts + notebook?
Thank you!
For the workflow, did I understood that it would be best to have:
- R scripts for steps that are building the final object
- notebooks for reports and results explorations?
Yes, this is the idea. Notebooks are generally the better option for steps that are exploratory or interactive in some way - e.g. making tables and plots. Scripts are often the better option for running an analysis that you plan to explore in a notebook. For example, in the doublet-detection
module I wrote, you can see that (for initial benchmarking steps) I used a script to detect doublets, and then a notebook to explore the doublet results. The parallel here would be to run copyKAT
in a script which would save the copyKAT
results as TSV files (this output doesn't need to be the whole Seurat object - we can save a little storage space with TSV instead!), and then explore those results in a notebook, which might also be a template notebook that looks at one sample at a time with params
.
That said, this is not a strict rule - you can still use a notebook to run copyKAT
if you feel more comfortable with that approach!
Should I try in another PR to split the first 3 notebooks into scripts + notebook?
Don't worry about this at all! The code you have written so far is completely fine. Again, not a strict rule :)
Purpose/implementation Section
Please link to the GitHub issue that this pull request addresses.
https://github.com/AlexsLemonade/OpenScPCA-analysis/issues/774
This PR is following the discussion from the PR#750, especially: https://github.com/AlexsLemonade/OpenScPCA-analysis/pull/750#pullrequestreview-2310191830
What is the goal of this pull request?
Briefly describe the general approach you took to achieve this goal.
In this PR, I am adding one notebook in /notebook/04_annotation_Across_Samples_exploration.Rmd to explore the annotations and label transfers for all of the samples in SCPCP000006.
We integrated all the samples from SCPCP000006 to have a rapid and global view of label transfer. Please note that the integration is not the aim of this PR, this is just a way to display better genes and features.
In order to explore the label transfer results, we look into some marker genes, table and percentages of cells in each annotation groups (from label transfers).
If known, do you anticipate filing additional pull requests to complete this analysis module?
Yes, next step would be to run
copyKAT
Results
This PR do not contain any result, only a single
notebook
.What types of results does your code produce (e.g., table, figure)?
One notebook that explores for all samples at once clustering and label transfer results.
What is your summary of the results?
85.4925176 % of the cells are labeled as kidney cells (fetal full reference, looking at the
fetal_full_predicted.organ
. I think this is quite a nice result. From theumap
andbarplot
, I think that most of the cells that are not labelled as kidney are endothelial or immune cells. (While writting this, I think it would be good to add a table offetal_full_predicted.organ
andfetal_kidney_predicted.compartment
, maybe in the next round after your review!)0.8660364 % of the cells are labeled as immune cells and 0.8226098 % of the cells are labeled as endothelial cells. As Wilms tumor is known to be a cold tumor (immune excluded), and COG Wilms tumor samples are mostly not pre-treated, it is quite expected to have very few immune cells. If this is a problem for running copyKAT with very few cells as a reference, I have no idea to be honnest.
Provide directions for reviewers
What are the software and computational requirements needed to be able to run the code in this PR?
I updated the renv.lock file, else no specific changes since the last PR :)
Are there particularly areas you'd like reviewers to have a close look at?
Is there anything that you want to discuss further?
What about the next step, what do you think how we can run cpyKAT/inferCNV? I think I should start with copyKAT and try to run with and without a reference of normal cells, and try to evaluate what is the impact of the annotation of normal cells on the infered CNV.
Author checklists
Analysis module and review
README.md
has been updated to reflect code changes in this pull request.Reproducibility checklist
Dockerfile
.environment.yml
file.renv.lock
file.