Open AlineTalhouk opened 8 years ago
how do we do this exactly?
@AlineTalhouk please let me know when you are free to discuss this
Hi @dchiu911 life has been a little crazy. How about this afternoon at 4ish? You should come to the seminar at 1pm
What's the seminar?
Not too much headway so far.
The pcNormal
description in the Nature paper (Senbabaoglu et al., 2014) doesn't really describe a hypothesis testing framework.
The sigclust
package originally tests the statistical significance of splitting a data set into two clusters using kmeans. Currently modified for k > 2
. But still unsure if this is doing what we think. Experimental code here.
There is also a bayesclust
package that I haven't investigated. Install at last link below using
install.packages("path/to/file/bayesclust_3.1.tar.gz", repos = NULL, type = "source")
References: http://www.tandfonline.com/doi/pdf/10.1198/016214508000000454?needAccess=true& https://github.com/pkimes/sigclust2 https://arxiv.org/pdf/1610.01424.pdf http://www.stat.ufl.edu/archived/casella/Papers/FuentesandCasella.pdf https://www.jstatsoft.org/article/view/v047i14
Setting parameter icovest = 2
in my_sigclust()
seems to yield more reasonable p-values (not just 0 or 1), but something in between. The description is
There are three options for estimating the eigenvalues of the covariance matrix: 1. Soft Thresholding (recommended for high dimensions, when the diagnostics indicate assumptions are met). 2. Sample eigenvalues (recommended for low dimensions, and when assumptions, such as Gaussianity fail, but known to be generally conservative). 3. Hard Thresholding.
Since we have n > p
for data(hgsc)
, option 2 seems to work better than option 1 as I had noticed.
Update: Option 1 seems more robust.
@AlineTalhouk please review again regarding the hypothesis testing
@dchiu911 icovest = 2 seems reasonable. I just read https://arxiv.org/pdf/1610.01424.pdf That makes sense to me as a framework.
We will probably need to do some simulations to see whether we are detecting or not..
derive a probabilistic assessment of cluster assignment