AlexsLemonade / OpenScPCA-analysis

An open, collaborative project to analyze data from the Single-cell Pediatric Cancer Atlas (ScPCA) Portal
Other
9 stars 17 forks source link

Metadata file: compilation of a metadata file of marker genes for expected cell types that will be used for validation at a later step #672

Closed maud-p closed 3 months ago

maud-p commented 3 months ago

Purpose/implementation Section

In this module 1, I create 2 metadata tables to compile from the literature information on marker genes and known genetic alterations, that will be used later to validate annotations of the Wilms tumor dataset.

Please link to the GitHub issue that this pull request addresses.

https://github.com/AlexsLemonade/OpenScPCA-analysis/issues/671 https://github.com/AlexsLemonade/OpenScPCA-analysis/discussions/635#discussioncomment-10140478

What is the goal of this pull request?

Wilms tumor (WT) is the most common pediatric kidney cancer characterized by an exacerbated intra- and inter- tumor heterogeneity. The genetic landscape of WT is very diverse in each of the histological contingents. The COG classifies WT patients into two groups: the favorable histology and diffuse anaplasia. Each of these groups is composed of the blastemal, epithelial, and stromal populations of cancer cells in different proportions, as well as cells from the normal kidney, mostly kidney epithelial cells, endothelial cells, immune cells and normal stromal cells (fibroblast).

In this module, we reviewed the literature to compile a table of marker genes for each of the expected cell types in the dataset. Additionally, we provide a table of know genetic alterations in Wilms tumor that can be useful to validate CNV profiles obtained after running inferCNV function.

Briefly describe the general approach you took to achieve this goal.

The table CellType_metadata.csv contains the following column and information:

The table GeneticAlterations_metadata.csv contains the following column and information:

If known, do you anticipate filing additional pull requests to complete this analysis module?

This module will be used for later validation of the annotations and results from inferCNV.

What is the name of your results bucket on S3?

Results should be uploaded to your bucket so they are available during review. See here for instructions on how to upload to your bucket: https://openscpca.readthedocs.io/en/latest/software-platforms/aws/working-with-s3-buckets/

What types of results does your code produce (e.g., table, figure)?

2 tables

Provide directions for reviewers

This section had 2 aims:

Are there particularly areas you'd like reviewers to have a close look at?

Is there anything that you want to discuss further?

Author checklists

Check all those that apply. Note that you may find it easier to check off these items after the pull request is actually filed.

Analysis module and review

Reproducibility checklist

maud-p commented 3 months ago

Dear Jaclyn,

Thank you very much for your encouraging comment and all the information. It is really useful to understand the expected structure of the final folder/module. I will commit the expected changes asap and pursue the clustering analysis.

My understanding of the reviewing process so far is: 1) I commit the changes that you requested and re-request a review until we are fine with the changes on both side and then I can close the pull request / merge the commit into the AlexsLemonade:main. 2) before starting a new part of the analysis (like step 2 clustering), I initiate a new issue to describe the plan. Once I am done with the analysis I submit another pull request linked to this new issue. And back to 1) Did I get it right?

Regarding the maintenance of the Dockerfile, thank you very much for your offer. I would like to try to do it, but if it starts being double work from your side checking and advising on it than maintaining it, please just let me know!

Thank you again.

jaclyn-taroni commented 3 months ago

I've looked at the commit history locally. One way to "remove" the clustering changes from this pull request would be to have you create a new branch and then refile the pull request (i.e., you close this one, and we start a new one).

The way you could do that is with the following steps.

First, you'd make sure you're on your main branch:

git checkout main

Then you're going to create a new branch (here I've called it start-wilms-analysis) at a place in the Git history before you added the clustering analysis:

git checkout -b start-wilms-analysis b754e5de88d7ec9d99be0b50db00e34d0b183a4b

Then you can push the new branch to GitHub with:

git push -u origin start-wilms-analysis

Then, you can file a new pull request using the new branch (start-wilms-analysis) from the GitHub UI.

You could largely copy and paste your initial comment when you file, and this closed PR here would retain the record of our conversation.

For now, I think we could plan to leave #680 as is and just make sure we don't merge it until the new PR goes into AlexsLemonade/main.

What do you think of this plan, @maud-p?

maud-p commented 3 months ago

Sounds good thank you @jaclyn-taroni for the precise steps :) I'll do it in a minute.

Regarding the #680 I think I find a way to add my commit to the maud-p-01-clustering branch now !

maud-p commented 3 months ago

one question @jaclyn-taroni , for the next step, each time I start a new step in the analysis, I should:

jaclyn-taroni commented 3 months ago

one question, for the next step, each time I start a new step in the analysis, I should:

* generate a new issue

* generate a new branch and work on it
  Correct?

Yes, that's right! More completely:

So we're aiming for 1 issue:1 pull request, but it does not always work out that cleanly. If an issue is particularly "big" (like it will require two+ scripts or notebooks to accomplish), you might end up with something that looks like:

And that's totally okay!

I think the most important takeaways are that your reviewers have enough information to do a good job with their review (e.g., context about your scientific goals), and the pull requests are a manageable size. ~400 lines that need to be reviewed or one script/notebook are some rules of thumb you can use for what is a "manageable size."

jaclyn-taroni commented 3 months ago

Closing in favor of #681.