AlexsLemonade / OpenScPCA-analysis

An open, collaborative project to analyze data from the Single-cell Pediatric Cancer Atlas (ScPCA) Portal
Other
9 stars 17 forks source link

rOpenScpCA: Add function to extract PCA matrix #754

Closed sjspielman closed 2 months ago

sjspielman commented 2 months ago

If you are filing this issue based on a specific GitHub Discussion, please link to the relevant Discussion.

https://github.com/AlexsLemonade/OpenScPCA-analysis/discussions/727 & https://github.com/AlexsLemonade/OpenScPCA-analysis/discussions/731

Describe the goals of the changes to the analysis module.

This issue tracks adding a function to extract and return a PCA matrix, with barcode row names, from a given object, which could be either an SCE or Seurat object. I plan to have the function determine directly if the object is SCE or Seurat rather than using an additional argument. This function should be flexible enough for users to be able to specify the name of the PCA matrix in the given object, but we can still have a default.

As I alluded to in this comment, I am currently thinking this should be a helper function (that may not be exported? though i suppose can't hurt?) that the calculate_clusters() function (currently being written in #749) can use. For that, we'd need to update calculate_clusters() to take an object instead of a matrix and call the helper function to convert it.

A benefit to this alternative approach is users only would need to use calculate_clusters() rather than having to extract the matrix themselves, which may be more user-friendly. But, thinking towards the future, we'll also want to be wrapping the calculate_clusters() function to be able to sweep parameters (noting that we want to sweep ourselves, not with bluster::clusterSweep(), since we want to ensure the seed is set fresh for each parameter iteration), and we'd only like to convert once for that. So, perhaps we allow for either a matrix or object to be passed into calculate_clusters() (I don't love the lack of consistency there but I am not strictly opposed to it)? Or, just keep this function separate and users can extract themselves.

Tagging @jashapiro for any thoughts here!

What will your pull request contain?

Function and tests.

Will you require additional software beyond what is already in the analysis module?

N/A

Will you require different computational resources beyond what the analysis module already uses?

N/A

If known, when do you expect to file the pull request?

This sprint

jashapiro commented 2 months ago

Just realized that I never weighed in on this when asked: I think having calculate_clusters() work on matrices and objects is just fine. The extract function is already going to work with Seurat and SingleCellExperiment objects, so I don't see much cost to applying the same kinds of logic to look for matrix objects as well!

I also expect that the "conversion" cost will be quite small compared to the clustering itself, so while I would also write the sweep function to extract first, I would not worry that somebody else might not!

sjspielman commented 2 months ago

Closed by #758 and #759