Closed aplavin closed 4 months ago
This exact code and versions work for me, I suspect you are using an old Julia version.
Dataframes support in FlexiJoins is subject to change, as noted in the docs. They aren't the focus of flexijoins and specifically marked as such for now -- the support is kinda rudimentary as I never use DF.jl myself, if someone is interested they are free to suggest integration improvements. Then it will likely be considered stable API.
For two months already, way before the actual release of 0.1.30, there was even a note in the docs on the upcoming version change: v0.1.30+ only supports dataframes on julia 1.9+.
In GitLab by @ericphanson on Jun 5, 2023, 18:03
For two months already, way before the actual release of 0.1.30, there was even a note in the docs on the upcoming version change: v0.1.30+ only supports dataframes on julia 1.9+.
The way to indicate that is to bump the compat requirement in the Project.toml. Your Project.toml indicates compatibility with Julia 1.7+: https://gitlab.com/aplavin/FlexiJoins.jl/-/blob/master/Project.toml#L36. I would recommend bumping the compability in a new patch release and yanking v0.1.30 from General, or retroactively adding the compat bound to the registry.
Sure, I would've made a breaking version or bumped compat, if that integration was a part of the guaranteed public interface. Meanwhile, it's specifically described as experimental and subject to change. It still works without any changes as long as one uses the latest Julia. Actual stable functionality (everything aside from DF.jl integration) continues to work on Julia 1.8 (tests pass).
In GitLab by @ericphanson on Jun 6, 2023, 07:23
Ah, I see. It is frustrating to me that FlexiJoins doesn’t support the Tables interface like most other packages.
Is there a simple way to convert a Tables.jl table to a supported type without dependencies? It sounds like a vector of NamedTuples or NamedTuple of vectors won’t work, since they don’t support simultaneous row and column indexing.
FlexiJoins supports anything following Julia collection interface: regular arrays (of namedtuples or arbitrary objects), a wide variety of abstractarrays, tuples, and so on. Support for stuff like arbitrary iterators or dictionaries is incomplete as of yet, but also in scope.
Neither "Tables.jl
-tables" nor "Julian collections" include the other category, so there was this choice of which of them to support first-class. For me, the answer is clearly collections, they are fundamentally more general than flat tables. As usual, I recommend simply using tables that follow Julian collection interface, there are many. And maybe advocate for adding collection interface to tables that don't support it (:
Support for Tables.jl
can be added to FlexiJoins
, but only if it doesn't affect collection handling. See https://gitlab.com/aplavin/FlexiJoins.jl/-/blob/master/ext/DataFramesExt.jl for how conversion from dataframes is handled for now.
In GitLab by @ericphanson on Jun 6, 2023, 08:34
We’ve had this discussion before but I still don’t know what is not-“flat”
about a collection. It sounds just like what Tables.jl calls a “row table”.
In which case, you could support any Tables.jl table by calling
Tables.rows
on it (or, since iterable support is incomplete,
Tables.rowtable
to get a materialized vector of NamedTuples).
I still don’t know what is not-“flat” about a collection
For example, specifically for flexijoins, it's a fundamental part of the design that after J = innerjoin((;A, B))
, J[1].A
is an element from A
, and J.A
is a collection of elements from A
. This is possible and feels natural when both A,B,J
are collections, but not so much with flat tables.
you could support any Tables.jl table by calling
Tables.rows
I could, if Tables.jl
-tables were a specific julia type. Otherwise, an object can be a collection and implement Tables.jl
interface, but these two meanings can be different. And I really think it's important to keep base Julian interface (collection-based) always consistent.
So, always calling Tables.rows
doesn't seem possible.
Also, the question is what the result should be. Currently, joining DataFrames returns a DataFrame with flattened columns: J.x
, J.x_1
, J.y
instead of J.A.x
, J.B.x
, J.B.y
. I guess that's what DF.jl users expect, but it requires special handling compared to regular julian collection join.
As v0.1.30 only brings minor improvements, and I myself only use julia 1.9 anyway, I think this version can be easily yanked and the newer v0.1.31 will be compatible with julia 1.9+. https://github.com/JuliaRegistries/General/pull/85716
In GitLab by @ericphanson on Jun 5, 2023, 14:17
using th example from https://github.com/beacon-biosignals/DataFrameIntervals.jl/issues/24#issue-1406829150, namely,
I get
using FlexiJoins 0.1.30, FlexiMaps 0.1.14. This works on v0.1.29 (same FlexiMaps version).