Open kescobo opened 3 years ago
I've never looked at Microbiome.jl
before, but I think there's a bit of incompatibility going on with the underlying EcoBase interface... looking briefly at Microbiome, it seems like it can (maybe?) use types that offer that interface in places (in particular here, but it doesn't offer it itself (I think?). If it did, a lot of these currently incompatibilities might go away, and you could do the plotting, etc. directly with Microbiome types. Diversity.jl
on the other hand implements the EcoBase interface here, for instance, as does SpatialEcology.jl
in a variety of places.
More generally, I think that the idea of a common way of actually storing the abundance data - or do you just mean a common tables interface, I wasn't sure? - may not work in practice. Diversity.jl
stores abundances as an AbstractMatrix
subtype directly in a Metacommunity object, whereas EcoSISTEM.jl
stores it in two ways. For simple multithreaded code, it stores it in a GridLandscape object, whereas for multiprocess (MPI) code, it stores it in an MPIGridLandscape object, because the abundance matrix itself is distributed across multiple nodes. Because they all (I hope!) satisfy the EcoBase
interface, then everything should just work across the ecosystem, and you can use the SpatialEcology plotting and so on directly irrespective of the underlying storage type. However, the last (MPI) one in particular has no flexibility in how storage is implemented to make the inter-process communication efficient.
If you are just proposing a common interface, and not a common storage mechanism, then that's different, but I'm not sure what interface you're proposing - do you just mean implementing the Tables.jl
interface? If so, what does implementing that involve? If it's simple and makes sense it might just be something that can be implemented directly it terms of the EcoBase primitives, so no-one has to do anything to get it to work?
looking briefly at Microbiome, it seems like it can (maybe?) use types that offer that interface in places (in particular here, but it doesn't offer it itself (I think?)
This seems entirely plausible - I didn't do much testing. Come to think of it, do we have a pre-made set of tests that check for compatibility? That might be a nice way to solidify the interface and make it easier to check.
I think that the idea of a common way of actually storing the abundance data - or do you just mean a common tables interface, I wasn't sure? - may not work in practice
I don't mean that they all need to have the same representation or use the same type specifically, I mostly mean that it would be nice to re-use functionality where possible, and try as much as we can to make them inter-convertible.
I think that the idea of a common way of actually storing the abundance data - or do you just mean a common tables interface, I wasn't sure? - may not work in practice
Maybe I'm only re-proposing EcoBase :laughing:. I am not nearly as up on the rest of the EcoJulia landscape as I should be, it's entirely possible that it's only me that needs to do any work. The impetus for this issue is that I used to use ComMatrix
, but wanted some things that it didn't have, so I split off and made my own type because (a) I wasn't super familiar with SpatialEcology internals, and (b) I wanted to be able to experiment and break stuff without needing to burden @mkborregaard every time I made changes. Now, I'd like to come back to being more compatible. As I say, it may be that all of the work is on my end.
do you just mean implementing the Tables.jl interface? If so, what does implementing that involve?
After banging my head against it for a bit, it turns out to be pretty simple. You can be a Tables source, or sink, or both. I've only implemented the source bit, since that was easier and all I wanted for my use-case. To be a source, all you really need is to be able to generate an iterator of named tuples, where the keys are column names (you can implement your own row types too, but a vector of named tuples is the proto-table).
If it's simple and makes sense it might just be something that can be implemented directly it terms of the EcoBase primitives, so no-one has to do anything to get it to work?
I think we could definitely implement a fall-back interface on the primitives, which could then be modified as needed by other packages.
Cool. That all sounds good to me. I think what I understood originally did sound a bit like a re-proposal of EcoBase, but in fact I think that adding in some tests that answer "Do I implement EcoBase?" would be really helpful - we could even think about it in terms of sources and sinks like the Tables
interface you describe. And adding in the core Tables interface through the current EcoBase primitives would be really nice too. Then we can have a think about whether there are enough commonalities in the implementations to thing about common storage mechanisms - my feeling is that if we can interoperate anyway, it may not be a high priority though.
There's another suggestion on Zulip that we think about providing the same interface as a trait-like thing rather than imposing inheritance on it, which could tie in nicely with providing the tests. I think the idea would be that if you did the inheritance, you wouldn't need to worry about the traits, but you could provide them instead...
Sorry guys, I've been busy, will take a look
Purpose
In an effort to improve cross-ecosystem compatibility, it would be nice to make table-like data structures more interoperable. My view of the ecosystem is quite narrow - I'm really only aware of
ComMatrix
fromSpatialEcology.jl
and my ownCommunityProfile
fromMicrobiome.jl
which took quite a bit of inspiration from the former. I also haven't usedComMatrix
in the last year or so as I was trying to iterate quickly inMicrobiome.jl
.cc @mkborregaard
Current advantages of
CommunityProfile
DataFrame
or write to CSVfeatures
) and columns (samples
) can be indexed with numbers, strings, or regexTaxon
orGeneFunction
for features,MicrobiomeSample
for samples). This enables storing additional information (including metadata) inside the community table typeCurrent advantages of
ComMatrix
(that I'm aware of)Current incompatibilities
thing*
andplace*
methods. If this is done right, one should be able to callfeaturenames
on aComMatrix
orspeciesnames
on aCommunityProfile
and get the same thing.ComMatrix
is a simple wrapper around a sparse matrix, I'm usingAxisArrays
andNamedDims
for a few things. But I actually think that this might be over-kill, since I mostly used it to take advantage of indexing, but then re-wrote the indexing in a way that doesn't rely on it so much. With a few tweaks toSpatialEcology
's views, I think I could drop that dependency.