SystemsGenetics / KINC

Knowledge Independent Network Construction
MIT License
11 stars 4 forks source link

Idea to Speed Similarity GMM Step #165

Open spficklin opened 4 years ago

spficklin commented 4 years ago

When using GMMs, KINC can get extremely slow with large sample sizes (i.e. thousands). However, it is probably not necessary to use all of the samples to establish "modes". I propose the following

  1. Rather than use all samples, use a randomly selected subset. Perhaps this could be as small as 30 samples? Using 30 should allow GMMs to run quickly.
  2. Perform multiple GMM iterations with different randomly selected samples. This would allow for different modes to be identified in different iterations.
  3. Select all non-overlapping clusters as the final set.

Just an idea....