AlexsLemonade / OpenScPCA-analysis

An open, collaborative project to analyze data from the Single-cell Pediatric Cancer Atlas (ScPCA) Portal
Other
9 stars 17 forks source link

Add rOpenScPCA function to perform graph-based clustering #745

Closed sjspielman closed 2 months ago

sjspielman commented 2 months ago

If you are filing this issue based on a specific GitHub Discussion, please link to the relevant Discussion.

727 & #731

Describe the goals of the changes to the analysis module.

This issue tracks adding a function to perform graph-based clustering to rOpenScPCA.

What will your pull request contain?

A function with associated docs & tests to perform graph-based clustering

Will you require additional software beyond what is already in the analysis module?

N/A

Will you require different computational resources beyond what the analysis module already uses?

N/A

If known, when do you expect to file the pull request?

Unless something else comes up, today!

jashapiro commented 2 months ago

I think it would be good to make sure that this function is flexible enough to handle other arguments to the clustering algorithm, which may be buried in the main documentation. In particular, I am thinking about louvain clustering which has a resolution parameter see cluster_louvain, and for leiden clustering there are a few more important parameters: https://r.igraph.org/reference/cluster_leiden.html; importantly the objective_function has a strong effect on results.

The minimal version of this would be to directly pass in cluster.args to the param object creation, but it would probably make sense to do some specific checking around those.

Finally, I would not make tsv export part of this function. Exporting a data frame is standard enough to be handled by the user with their preferred tools.

sjspielman commented 2 months ago

The function should take a matrix with column names where rows are assumed to be genes and columns are assumed to be cells.

This might actually not be what we want. If the goal is to have something flexible enough for SCE & Seurat users to dive into, it might be better to allow users to pass in either a matrix, an SCE object, or a Seurat object, and the function would pull out the matrix for the latter two types. This means we'll need some helper functions.

For now, I'll continue assuming* a matrix is being passed in, but this is something to either re-consider as a separate issue or as part of review.

sjspielman commented 2 months ago

Closed by #749