Closed JingxuanChen7 closed 2 months ago
Hello @jashapiro , thanks for all the help!
As for the module setup itself, I very much appreciate you setting up a distinct environment for your work with all of your package dependencies included. It seems like your project is mostly based in R? If that is the case, we would in general prefer that you set up your environment with renv, as that allows us to more easily track the versions of specific packages and dependencies. The versions available on conda are often a bit out of date as well, so this allows us to keep up better with current versions of Bioconductor, etc.
If you would like any help setting that up, please let me know, or if you have a compelling reason that you would prefer to stick with conda for this purpose, we can also consider that for this project.
Thanks so much for the suggestion. I understood that using renv
is much easier to manage R packages. I have made changes transferring to log versions with renv
in my latest commit (https://github.com/AlexsLemonade/OpenScPCA-analysis/pull/764/commits/8e696213eb9b8d9c712fdf42a7fca0ce9e1489d6).
Since I think it makes sense to start with the project as a whole, I did not spend too much time looking at your processing code. I did note, however, that you are performing normalization and dimension reduction within Seurat. This may be required by your downstream analysis, but I will note that we do include default normalized and dimension reduction matrices, and it may be more efficient to maintain the pre-computed values in the SingleCellExperiment object rather than recalculating them. If this is something you would like to pursue, I am happy to answer questions about where those values are located. The objects and processing steps are generally described in the portal documentation at https://scpca.readthedocs.io/en/stable/sce_file_contents.html
Thanks for the information. In my preprocessing code, I modified some parameters in the Seurat workflow (e.g number of features in the feature selection, whether to run harmony
since I may merge samples in following analysis). In the future, I may want to try different feature selection or clustering algorithms, since cell types are not split out in some samples (preliminary results, not shown in this PR). Therefore, I would like to keep my codes for pre-processing as of now.
I think all the comments have been resolved in my last two commits. Please let me know if I should make any other changes. Thanks again for all the help!
Hi @jashapiro , thank you so much for all the suggestions on coding robustness and style. I really appreciate it!
Regarding the gene symbol vs. Ensembl ID, I keep Ensembl ID at this stage. So I also updated result .rdsSeurat
files on S3 bucket. Regarding doublets, I added a column in the metadata instead of removing them. Other minor coding style changes have been included in my latest commit https://github.com/AlexsLemonade/OpenScPCA-analysis/pull/764/commits/9be5d74171dfd607f8581ad126e60715023e05ba .
Thanks again for the careful code review. Let me know if anything else needs to be modified!
Oh, I did have one more comment, which was that you might want to store the .rdsSeurat
files just in the scratch/
directory. I am not sure you will need to syncy them to S3, as at this stage they are not really results files so much as reformatted data files. I was able to verify that you are successfully syncing though, which has some value for the future!
Oh, I did have one more comment, which was that you might want to store the
.rdsSeurat
files just in thescratch/
directory. I am not sure you will need to syncy them to S3, as at this stage they are not really results files so much as reformatted data files. I was able to verify that you are successfully syncing though, which has some value for the future!
Hi @jashapiro , thank you for reminding me about the scratch/
folder! It's more appropriate to put the intermediate files to scratch/
instead of results/
. I would re-sync the results in my next PR.
In addition, I applied changes as suggested in commit https://github.com/AlexsLemonade/OpenScPCA-analysis/pull/764/commits/1731730a028e1a6a044cfb2b460a21deca3ee074 Again, I appreciate all the suggestions, which are really helpful!
Purpose/implementation Section
In this PR section, I'm trying to initialize an analysis module skeleton for Wilms tumor dataset SCPCP000014, as well as checking in scripts for pre-processing the provided
sce
objects.Please link to the GitHub issue that this pull request addresses.
What is the goal of this pull request?
This PR files scripts for pre-processing provided
sce
objects using a standard Seurat workflow, which would be used in the following analysis in my next PR.Briefly describe the general approach you took to achieve this goal.
A standard Seurat workflow (normalization, feature selection, PCA, clustering, dimensional reduction, etc) was applied to all 10 samples in this dataset.
If known, do you anticipate filing additional pull requests to complete this analysis module?
Results
What is the name of your results bucket on S3?
s3://researcher-009160072044-us-east-2/cell-type-wilms-tumor-14/results/00_preprocessing_rds/
What types of results does your code produce (e.g., table, figure)?
Intermediate RDS files containing pre-processed Seurat objects.
What is your summary of the results?
Results for this section contains 10
.rdsSeurat
objects for further analysis.Provide directions for reviewers
In this section, tell reviewers what kind of feedback you are looking for. This information will help guide their review.
What are the software and computational requirements needed to be able to run the code in this PR?
Are there particularly areas you'd like reviewers to have a close look at?
For the first PR, I would like to make sure if my way to setup the module skeleton, computing environment and documentation can meet the need.
Is there anything that you want to discuss further?
I have some questions for this PR:
Thanks for reviewing!
Author checklists
Check all those that apply. Note that you may find it easier to check off these items after the pull request is actually filed.
Analysis module and review
README.md
has been updated to reflect code changes in this pull request.Reproducibility checklist
Dockerfile
.environment.yml
file.renv.lock
file.