Closed kescobo closed 3 years ago
Thanks @kescobo , I'm currently in Ecuador but will try to find time to look at this. There are a number of things - some I think belong in EcoBase, others should be more tightly integrated with the existing framework. Is the iteration interface part and parcel of the Tables one, or can they be conceived differently?
My understanding is that you can implement a by-row interface (which is the iteration interface), or a by-column interface, or both. I started out trying to do both, though to be honest most of what's written here was @quinnj typing at my computer 😆 .
I think we still need to think about how to include thingnames
in the schema for the row interface, so if it makes more sense to you, I can start out with just the column interface, get that working, and then move to the by-row interface. TBH I haven't looked at this since the hackathon, and I didn't 100% follow what Jacob was doing, but I can spend a bit more time on it next week, and you can chime in as you're available. There's certainly no rush on my end, so don't worry about taking time while you're in the field.
Cool. Yeah I have generally been planning a singlecommunity
and singlespecies
type (or place
and thing
) type for many things, so it's essentially that it would be obvious to have that working right from the start (be useful to have input from @richardreeve here).
Sorry for the delay in getting back to you - I've been away. I haven't had much time to look at this, but it does seem like something very handy to have in EcoBase if it were possible and not too concrete for that package (which it may be!).
On the singlecommunity
comment, I have OneCommunity <: AbstractPlaces
subtype in Diversity.jl:src/Partition.jl - I couldn't directly inherit from or use a singlecommunity
/OnePlace
type in EcoBase.jl
though, as all of my places types actually inherit from AbstractPartition <: AbstractPlaces
...
No worries - this is a slow-moving issue 😆
@mkborregaard Do I understand correctly that you'd like to implement that before this goes forward? Or is it that you want this in place to develop those?
TBH, I still don't completely understand how this works, but I was thinking of trying to implement it for SparseArrays first (which should be an easier case) and making a PR to Tables.jl. That should help me to understand, and would probably make implementing the types here more straightforward.
Yes, I will implement that and then let's keep this open, it'll probably be a small change to this PR once that's done.
Sorry for letting this languish so long. I got stuck again with DataFrames upgrade due to compat for this package, and so I decided to try to figure it out. Since I'm not familiar with all of the types here, and since it looks like you use DataFrames in other places internal to your types, I decided to try it out in Microbiome with a new AbstractAssemblage
. You can see it on this branch, and the important bits for Tables interface are here:
AbundanceTable
is basically the same as a ComMatrix
## -- Tables Interface -- ##
Tables.istable(::AbundanceTable) = true
Tables.columnaccess(::AbundanceTable) = true
Tables.rowaccess(::AbundanceTable) = true
Tables.getcolumn(at::AbundanceTable, i::Union{Int, Symbol}) = at[:, i]
Tables.getcolumn(atr::AbundanceTableRow, i::Union{Int, Symbol}) = atr[i]
Tables.columnnames(at::AbundanceTable) = [:features, Symbol.(name.(samples(at)))...]
Tables.columnnames(atr::AbundanceTableRow) = keys(atr.cols)
function Tables.schema(at::AbundanceTable)
elt = eltype(abundances(at))
coltypes = [eltype(features(at)), (elt for _ in 1:nsamples(at))...]
return Tables.Schema(Tables.columnnames(at), coltypes)
end
Tables.columns(at::AbundanceTable) = (; (col => at[:, col] for col in Tables.columnnames(at))...)
Tables.rows(at::AbundanceTable) = [at[i, :] for i in 1:nfeatures(at)]
Having gone through this process with a type I built myself, I now think I understand better how I would go about doing it for a ComMatrix
, but I wonder if we should think about making this part of the interface in EcoBase
. I think it would require:
AbstractAssemblageRow
type if we want to allow row access, but that's pretty easyAbstractThing
s and AbstractPlace
s from an AbstractAssemblage
(actually - now that I think of it, these might already be in the spec)and not a lot else.
So, I changed my mind again, and decided to go with AxisArrays
(via the AxisIndices.jl
package). I think I recall you talking about doing this with ComMatrices at some point, and I have to say, it was pretty straightforward. Adding Tables integration was a tiny bit of extra work, since AxisIndices
does not implement the Tables methods, but I can now have row and column headers as a custom type, with table-wide or sample (site) specific metadata, and index into the table with strings or numbers for rows and/or columns.
I don't have all of the view
machinery you do here, but I rarely end up using that in practice, I think, and it would probably be straightforward to build that in too.
I think I'm going to close this PR for now, since I'm not sure where you want to go with ComMatrices, but I'd be happy to take on building the Tables interface for you once you decide, now that I understand better how it works.
So have you moved away from ComMatrix again in MicroBiome? Do you think what you've built could be a useful drop-in replacement here?
So have you moved away from ComMatrix again in MicroBiome
Yes. Not for any principled reason, I just wanted to experiment with some stuff, and using my own package to do that made more sense to me.
Do you think what you've built could be a useful drop-in replacement here?
Not quite drop-in - it lacks view
machinery, and I'm certain there are other things that aren't as polished. Another big thing is that I don't have any other data structures other than the ComMatrix-like one, so the machinery you have for spatial information and stuff is absent. That said, I think it might be pretty easy to add that as a separate axis
But it's close. If you're interested, I could factor stuff out into EcoBase.jl or some other shared package. Nice thing about AxisArray
s is that the whole issue of what to call the axes (even the number of dimensions) doesn't really matter.
I think an axisarray solution would make sense for commatrix. Would be cool to hear @rafaqz opinion here too
I dont know what ComMatix does, but you could use DimensionalData.jl and get the Tables.jl interface for free. And spatial data from GeoData.jl would be supported too.
You can also extend AbstractDimArray
if you needed to add things to it, or just put extra data in the metadata
field.
I wouldnt use AxisArrays.jl, there are a lot of reasons people have written 5 competing replacements for it!
So apart from being the author do you have a compelling reason to prefer DimensionalData out of those five replacements ? :-)
The primary things I want out of a data system are
And not much else. Current implementation works for these things (it's using AxisIndices and NamedDims specifically, which aren't raw axis arrays if that makes a difference). That said, I don't feel super strongly about which representation we use - if we come to a consensus on what the ecosystem is going to use, I'm happy to use that.
Should we take discussion to zulip or to a new open issue?
All of those points 1:3 work, although I dont think that is unique.
Compelling reasons:
Otherwise AxisIndices/ NamedDims is fine.
DD also has the most stars now. Probably because of GeoData.jl integration as people seem to star and use both.
Although I think you might want a different kind of table to DD - its more designed for tables of stack layers.
But using a categorical dimension as table columns could actually be generalised, which I hadnt thought about before now.
That sounds very convincing. A ComMatrix is just a Matrix where we know what the axes are with lots of dispatches defined
This is a WIP effort started at the JuliaCon hackathon to implement a Tables.jl interface for ComMatrices (and other types?). I got some help from Jacob Quinn to get started, now I just need to go through and fix what's breaking.
Addresses #23 (I think)
Ultimately, my plan is to remove the dependency on
DataFrames
and get everything working onTables
(if that makes sense to you @mkborregaard )