AlexsLemonade / OpenScPCA-analysis

An open, collaborative project to analyze data from the Single-cell Pediatric Cancer Atlas (ScPCA) Portal
Other
5 stars 14 forks source link

Add function to perform graph-based clustering #749

Closed sjspielman closed 2 weeks ago

sjspielman commented 3 weeks ago

Purpose/implementation Section

Please link to the GitHub issue that this pull request addresses.

Closes #745

What is the goal of this pull request?

This PR adds a function to perform clustering to the rOpenScPCA package. There are many file changes, because I've also merged in main to stay in sync, but review can focus only on code in packages/rOpenScPCA.

I'm filing this as a draft to get some initial feedback, and then can implement the rest (including adding more tests, which I know are thin right now!) and mark for review. Here are the main questions I'm looking for feedback on, but of other high-level feedback is welcome :)

Author checklists

Check all those that apply. Note that you may find it easier to check off these items after the pull request is actually filed.

Analysis module and review

Reproducibility checklist

sjspielman commented 2 weeks ago

Should be ready for a real look now! I think the expect_error() checks are probably overkill, but I had a nice time writing them, so here they are for now. Sorry for the clutter..........!

jashapiro commented 2 weeks ago

Should be ready for a real look now! I think the expect_error() checks are probably overkill, but I had a nice time writing them, so here they are for now. Sorry for the clutter..........!

Will review after #752 goes in

jashapiro commented 2 weeks ago

Did I say 752? I meant #753

sjspielman commented 2 weeks ago

Or we can just check that all the args have length 1, and not support args that don't follow that pattern, as I don't think we expect to use anything like that...

I haven't really been able to think of anything better than this... I guess we could also return the cluster_args list for users to do what they want with, but given that a minority of users (if any) are likely to be in this circumstance, I do think just the straight data frame return is cleaner.

sjspielman commented 2 weeks ago

Ready for another look, and in particular interested in your feedback on my docs wording about parameter defaults, and overall wording for cluster_args length-1 restriction.

sjspielman commented 2 weeks ago

I wrote a bit of what I was thinking there, but you will probably want to rewrite it.

I think what you had there was mostly there tbh, so I expanded it a little bit with more specific bluster information but kept most of what you had.