EcoJulia / Microbiome.jl

For analysis of microbiome and microbial community data
Other
48 stars 10 forks source link

Transfer Tree plot for Hclust recipe to StatPlots #19

Closed piever closed 5 years ago

piever commented 6 years ago

After a lengthy discussion on slack, it seems that StatPlots will not be afraid to take dependencies to support visualizations for types in the soon to be revamped Stats metapackage. The Tree plot for Hclust recipe by @kescobo is an example of that and I wanted to ask if there is consensus to move it to StatPlots (I for one am in favor). I think Microbiome is a bit of an odd place for the recipe as it could hurt discoverability (many people do clustering without necessarily working with Microbiome data).

cc: @mkborregaard

mkborregaard commented 6 years ago

I'm in favour too!

kescobo commented 6 years ago

I'm perfectly happy to move it - we discussed it earlier here and the conclusion at the end there was to keep it in my package. But if things have changed, no worries at all.

Another really useful thing (esp. for plotting, because it makes things like clustered heatmaps prettier) is the fast optimal leaf ordering I implemented. Not sure that makes sense in a plotting package exactly, but folks at Clustering.jl have not responded about implementing it there.

kescobo commented 6 years ago

Hmm - looks like images aren't getting generated in my docs, but the idea of leaf ordering is:

julia> using Microbiome, Clustering

julia> dm = [0. .1 .2
                    .1 0. .15
                    .2 .15 0.];

julia> h = hclust(dm, :single);

julia> h.labels = ["a", "b", "c"];

julia> hclustplot(h)

screen shot 2018-05-17 at 14 09 07

This is a valid tree, but b is actually more similar to c, so it's nicer if they're ordered next to each other:

julia> optimalorder!(h, dm)

julia> hclustplot(h)

screen shot 2018-05-17 at 14 09 19

piever commented 6 years ago

I'm probably unable to review your code on this one as I don't know the algorithm, but if you can also add a test that you get the same results as your reference paper I'd be happy to include it. It wouldn't be the first time that we write an algorithm that's mainly useful for visualization in a plotting package (for example @mkborregaard wrote code to compute optimal bin number in 2D histograms).

kescobo commented 6 years ago

@piever Ahh, good to know, thanks! Probably won't get to it for a couple of weeks, but I will definitely work on it since that's the place it makes the most sense for sure.

mkborregaard commented 6 years ago

Sounds great.