Closed sjspielman closed 1 month ago
Should be ready for another look! Note also that I had to keep the pc_name
argument since it's not one of the arguments pass into calculate_clusters()
(it could be, but I'd prefer to extract the matrix, if needed, once and not each iteration).
Thought of one more edge case - the nrow
/length
check between the matrix and clusters will pass even if clusters is not a vector, but eg a data frame with the same number of columns as the matrix. This is really unlikely to happen but can't hurt to catch. I updated here if you want to have a look: https://github.com/AlexsLemonade/OpenScPCA-analysis/pull/779/commits/c38ff0a4ba4f81b9b0199b60542865094360b97d
Thought of one more edge case - the
nrow
/length
check between the matrix and clusters will pass even if clusters is not a vector, but eg a data frame with the same number of columns as the matrix. This is really unlikely to happen but can't hurt to catch. I updated here if you want to have a look: c38ff0a
I can imagine this failing in other ways you don't expect that would otherwise be fine (any object with an attribute will fail is.vector
, not just factors; for example I think a list of clusters would actually work in the function), so I personally would not have bothered with this. If people do horrible things that result in failures down the line, we can't always stop them.
Closes #773
This PR adds a function to bootstrap clusters and calculate ARI for a given number of reps. I ended up writing a that mostly wraps
calculate_clusters()
, thereby letting that function handle argument checking (on the first bootstrap iteration). This function differs from the other evaluation functions in that it takes a vector of clusters (hence, I check it's not a data frame; I do that b/c, as I learned today,is.vector(df$column)
isFALSE
). The function returns a data frame of ari results and clustering parameters, as returned bycalculate_clusters()
.Note also that I updated examples across function docs to use a seed; we want to encourage seeds! (I'll also note, at one point I had the cute idea of actually providing
cluster_df
to this stability function and grabbing cluster parameters directly from the df, but changed my mind because, mainly, we won't necessarily know all the parameter columns that could be in that df because ofcluster_args
, and users may have added their own.).