Break out commonly used functions in tumor cell validation

Purpose/implementation Section

Please link to the GitHub issue that this pull request addresses.

Preparation for #480

What is the goal of this pull request?

As I was working on #480, I realized there are a few things that we are going to want to do in all of the notebooks that validate tumor cells and it might make sense to take that code out of the actual exploratory notebooks and put into functions that can be sourced in. I also thought it might be helpful to get these changes done separately from adding a whole new notebook so I'm doing this first.

Briefly describe the general approach you took to achieve this goal.

I took the heatmap functions that I created in #500 and added them to a new script that lives in scripts/utils. I also added in two functions for creating a classification data frame and a marker gene data frame. The classification df has one row per cell and contains all the classifications from all methods we used. The marker gene data frame is an expanded version with one row per marker gene per cell.

I then updated the existing notebook to use these functions and removed the code from that notebook.

I made a few additional modifications based on things I was working on for #480:

For CopyKAT I now am including the mean cnv detection for all chromosomes along with the predictions in the predictions output. That way we can create plots with it when we read in the CopyKAT output.
For InferCNV I am including the scaled_mean_proportion along with the predictions output. Again, this is so we can use this information in the plots used for validation.

If known, do you anticipate filing additional pull requests to complete this analysis module?

Yes

Results

No results right now as this is just reorganizing some existing code.

Author checklists

Analysis module and review

[ ] This analysis module uses the analysis template and has the expected directory structure.
[x] The analysis module README.md has been updated to reflect code changes in this pull request.
[x] The analytical code is documented and contains comments.
[ ] Any results and/or plots this code produces have been added to your S3 bucket for review.

Reproducibility checklist

[ ] Code in this pull request has been added to the GitHub Action workflow that runs this module.
[ ] The dependencies required to run the code in this pull request have been added to the analysis module Dockerfile.
[ ] If applicable, the dependencies required to run the code in this pull request have been added to the analysis module conda environment.yml file.
[ ] If applicable, R package dependencies required to run the code in this pull request have been added to the analysis module renv.lock file.

AlexsLemonade / OpenScPCA-analysis