biocore / mds-approximations

Multidimensional scaling algorithms for microbiology-ecology datasets.
6 stars 7 forks source link

Generate random (gaussian or realistic) distance matrices #45

Closed HannesHolste closed 6 years ago

HannesHolste commented 6 years ago

Code includes:

  1. Method to generate random distance matrix (drawn from gaussian distribution), i.e. unrealistic totally random data.
  2. Method to generate random distance matrix from a realistic OTU table (either band or block patterns) – thanks to work by @mortonjt. Uses bray-curtis distance to generate distance matrix from OTU table. I had to package Jamie's work as a python wheel, included in the conda environment.yml file, because it's not public on pypi yet.

Open question: For #2: Right now the number of features in the OTU table is just equal to whatever is specified as the desired dimension of the distance matrix. Should this be user-configurable? If so, what is a sensible default value of features in the OTU table? e.g. by default, it can be equal to the number of samples, or 1/10th the number of samples, or fixed at like 6,000 or something. Is there any upper limit to number of features we see in typical OTU tables? How much does it differ between closed-reference OTU picked tables vs. deblur tables?

coveralls commented 6 years ago

Coverage Status

Coverage remained the same at 87.429% when pulling d3ddea24c22f3e27afe5a9a2b4ed7ec406819a21 on structured-randdm into 165ae6f1b65ba8143379f840de327d71ec1b3ed4 on master.

HannesHolste commented 6 years ago

@antgonza thanks for feedback. Changes made as requested. ok to merge?

antgonza commented 6 years ago

@mortonjt, could you take a look and if you are OK with these changes can you merge? Thanks!

mortonjt commented 6 years ago

Very exciting! Thanks @HannesHolste!