EcoJulia / SpatialEcology.jl

Julia framework for spatial ecology - data types and utilities
Other
58 stars 7 forks source link

Add `pairwise()` method from Distances.jl #36

Closed kescobo closed 5 years ago

kescobo commented 5 years ago

I've been using my own DistanceMatrix struct in Microbiome.jl, and defined some methods to generate one, but I don't actually think that's a good idea anymore.

This allow one to do:

using SpatialEcology
using Distances

julia> co = ComMatrix([1 2 3; 2 2 2; 3 1 3]);

julia> pairwise(BrayCurtis(), co)
3×3 Array{Float64,2}:
 0.0       0.272727  0.142857
 0.272727  0.0       0.230769
 0.142857  0.230769  0.0
mkborregaard commented 5 years ago

Nice - @richardreeve how does this interact with the Diversity framework?

richardreeve commented 5 years ago

It doesn't have a massive impact to be honest, and it looks like a good thing to be able to do. I put RenyiDivergence into Distances, so that if you want to look at pairwise Rényi divergences between populations, you can use pairwise(RenyiDivergence(2.0), pop) (for the q=2, Simpson-like one, for instance).

However, the Diversity package works on comparisons between a population and the whole metapopulation, and indeed the entropy framework in general doesn't (and can't) look at straight pairwise comparisons. If you're interested, the most intuitive explanation (to me!) is that the cross-entropy (which is what you're measuring here, and which at q=1 is related to the K-L divergence) between two populations is conceptually and mathematically derived from the inefficiency of using one encoding to represent another. As the relative abundances of the letters in the two alphabets diverge these inefficiencies get more and more pronounced, and divergences increase, which is good. However, if a letter (species) is absent from one encoding (population), then the population without it simply can't represent a population with that species, so the inefficiency is infinite. And since the species lists are often different in different populations, it means that divergences are very often infinite, which is unhelpful to say the least!

Anyway, we do have a way of measuring pairwise dissimilarities between populations, and I'll have a think about how to incorporate them into this framework, but it's non-trivial and so not fully developed in the code yet... I'd go ahead and merge!

kescobo commented 5 years ago

Github is being weird again, but at least it's not saying these comments are from the future

Screen Shot 2019-06-11 at 9 34 33 AM

@mkborregaard I wasn't sure if you wanted me to merge... Feels weird to push the green button on someone else's package 😆

mkborregaard commented 5 years ago

Generally I think of it like if I "approve" it's fine if you merge :-)