Open davidanthoff opened 5 years ago
if a unification of a generic tables API is in the works, i'm inclined to hold off on any changes to Gadfly until then.
also, what was the problem with updating support for Gadfly in IterableTables?
Tables.jl no longer depends on Requires.jl, by the way.
what's the latest on unifying the TableTraits / Tables API?
David and I didn't get a chance to talk about it at JuliaCon, which we had planned.
The reality though, is that Tables.jl is already a superset of TableTraits.jl in terms of a pure table interface. i.e. if you implement the Tables.jl interface, you get TableTraits "for free" by then defining IteratorInterfaceExtensions.getiterator(x::MyTable) = Tables.datavaluerows(x)
, so that's what I've been encouraging people to do, as it gives you the best of both worlds, especially given the current reliance on DataValues of TableTraits, whereas the rest of the ecosystem just relies on the builtin missing
, i.e. normally it's more natural and simpler to implement Tables.jl than jump through DataValue-hoops for TableTraits. If someone spells out how exactly they want to use Tables.jl in this package, I'm happy to put a quick PR together to show the basics, the nice thing is it's usually not more than a 10-15 minute exercise.
Yeah, I think if Gadfly would simply use either Tables.rows
or Tables.columns
, it would work with all sources that exist today (i.e. both Tables.jl and TableTraits.jl sources), and I would also think that any unification of table interfaces would not break a package that simply used one of those two functions.
I'm very much in favor of replacing our lazy loading of DataFrames into Gadfly via Requires with the standard table interface. Ideally we would then remove our dependency on Requires completely afterwords.
we also use Requires for TerminalExtensions
Sorry wanted to find out the status of this issue
nothing has changed in Gadfly. what's the latest with a unified table interface?
I guess it's still distinct as we have TableTraits.jl
and Tables.jl
. There has been no unification as to the best of my knowledge but Tables.jl
no longer has a dependency on Requires
. @quinnj am i correct?
I think David's comment from last year is still accurate and the best way forward:
Yeah, I think if Gadfly would simply use either Tables.rows or Tables.columns
Tables.jl will automatically work with any TableTraits.jl source, and do the DataValue
unwrapping automatically. Tables.jl has also cleaned up it's dependencies a lot over the last 1.5 years, and doesn't use Requires.jl anymore.
So all-in-all, I think if someone is interested in doing the work here, they could write a single generic table plotting interface that just iterated over rows from Tables.rows(x)
and a single dependency on Tables.jl will do the job.
IterableTables.jl used to have Gadfly integration pre julia 1.0. What that enabled was that one could pass any tabular source, not just a
DataFrame
to Gadfly'splot
and things would just work. That included support for lots of different table types (TypedTables, IndexedTables, JuliaDB etc.), file IO (CSV, Excel, Feather, lots of stats files) and Query results. That support unfortunately didn't survive the transition to julia 1.0. I just looked into reenabling it, but for various reasons that would be more complicated with the recent changes to Gadfly.But, the whole idea of having the integration live in IterableTables.jl was a pretty ugly hack in any case, and so I'm wondering whether it would make more sense to just make Gadfly.jl natively support TableTraits.jl/Tables.jl out-of-the box. Is that something the maintainers of Gadfly would consider?
Broadly speaking I believe the strategy in such a case would be to completely remove the
DataFrames
integration code here in the package, and instead just support any TableTraits/Tables source (which includesDataFrame
, so no functionality would be lost). Just as an example, VegaLite.jl has used that strategy for more than a year now, without a single hiccup or problem and great integration with lots of different data sources.I think this also has the potential to greatly help with load times of the package, at least medium term. I think a consensus is emerging that Requires.jl increases load times of packages quite a bit. In the scenario I'm suggesting, Requires.jl would no longer be needed at all for the data source handling. Instead, Gadfly.jl would take a normal dependency on a very small, fast loading table interface package (which I'm 99% certain would load much faster than Requires.jl). Of course at that point there is still the
TerminalExtensions
use-case for Requires.jl, but I'm fairly confident that one might be able to get rid of this as well, at least medium term (with e.g. https://github.com/JuliaLang/julia/pull/30922). So I think there is a path that would allow Gadfly.jl to drop the Requires.jl dependency, and that should help with package load times.There are currently two strategies for getting this kind of integration going:
I should also say that I think @quinnj and I might be able to further shrink/unify the TableTraits.jl/Tables.jl story at juliacon, so hopefully those kind of choices will disappear in the near future.
My gut feeling is that maybe the best strategy would be to start a PR that adds Tables.jl integration, but then hold off merging until Tables.jl gets rid of the Requires.jl dependency.