Closed sjspielman closed 1 week ago
Here's a question for you regarding allowing multiple algorithms: What is the right way to handle varying parameters that are shared across algorithms?
For example, here,
objective_function
will apply to only leiden
since it's algorithm-specificnn
would apply to both louvain and leiden.
sweep_clusters(
sce,
algorithm = c("louvain", "leiden"),
objective_function = "modularity",
nn = c(10, 15)
)
Alternatively, we could have some very fine control. Taking a stab at this conceptually but it feels a bit gnarly?
sweep_clusters(
sce,
list(
"louvain" = list(nn = c(10, 15),
"leiden" = list(nn = c(20, 25), objective_function = "modularity")
)
)
Here's a question for you regarding allowing multiple algorithms: What is the right way to handle varying parameters that are shared across algorithms?
For example, here,
objective_function
will apply to onlyleiden
since it's algorithm-specific- but
nn
would apply to both louvain and leiden.sweep_clusters( sce, algorithm = c("louvain", "leiden"), objective_function = "modularity", nn = c(10, 15) )
Alternatively, we could have some very fine control. Taking a stab at this conceptually but it feels a bit gnarly?
sweep_clusters( sce, list( "louvain" = list(nn = c(10, 15), "leiden" = list(nn = c(20, 25), objective_function = "modularity") ) )
I would do the former, but then handle it in the function withmutate(objective_function = ifelse(algorithm == "leiden", objective_function, NA_character_)
after the expand_grid
I would do the former, but then handle it in the function with
mutate(objective_function = ifelse(algorithm == "leiden", objective_function, NA_character_)
I had basically this exact this code in there yesterday before I locked down the algorithm! But in the end, I didn't think it was actually needed since calculate_clusters
will ignore irrelevant parameters. I suspect we can just toss everything into the parameters list and the situation will sort itself out, but I look forward to finding out what I might be missing 😄
I had basically this exact this code in there yesterday before I locked down the algorithm! But in the end, I didn't think it was actually needed since
calculate_clusters
will ignore irrelevant parameters. I suspect we can just toss everything into the parameters list and the situation will sort itself out, but I look forward to finding out what I might be missing 😄
The reason we can't just let calculate_clusters
ignore it is the multiple runs problem. The mutate
function should be followed by a distinct
to remove repeats.
This is getting closer, but is definitely not there yet. Some updates:
NA
makes sense for avoiding duplicate runs when irrelevant params are varied, match.args
doesn't want it. So, instead of NA
, I used the values that represent the defaults for those parameters, which maybe seemed like a decent middle ground (maybe? decent? middle? i'm not hedging, you're hedging!). Did you have a different thought there?cluster_args
to the final data frame if it's empty, so I fixed (and tested) here https://github.com/AlexsLemonade/OpenScPCA-analysis/pull/765/commits/0e47c817ef0ecdbf1e5a21ab1586183881365e93cluster_args
- this is a problem! We don't check that these are sane for the given algorithm, and I tend to think we indeed should not be in the business of doing so since it depends on igraph
. That said, bluster
will fail if an irrelevant parameter is passed in, and that's very much a possibility with the sweep function. Here's what I've thought of (noting I prefer the latter) -
cluster_args
for the sweep functioncluster_args = list(louvain = ..., walktrap = ....))
. Speaking of
cluster_args
- this is a problem! We don't check that these are sane for the given algorithm, and I tend to think we indeed should not be in the business of doing so since it depends onigraph
. That said,bluster
will fail if an irrelevant parameter is passed in, and that's very much a possibility with the sweep function. Here's what I've thought of (noting I prefer the latter) -
- We could cancel
cluster_args
for the sweep function- We could have users supply this argument a nested list per algorithm, e.g.
cluster_args = list(louvain = ..., walktrap = ....))
.
I would vote for no cluster_args
in the sweep function. If you really need it, you can write your own sweep. And if there are particular cluster_args worth adding, we can add support as needed.
Closes #755
This PR adds a function and tests to perform clustering across a set of parameters. Implementation details:
cluster_set
to indicate which round of clustering the values pertain to, and for easiersplit
ing in the future. Note that I could also leave this as a list and remove thedplyr::bind_rows()
and allow users to take this step themselves if they prefer. A list might also be preferable since this function will end up getting used in a template Rmd to perform/evaluate clustering parameters, so for a lot of that we'd have to revert to a list anyways.