beacon-biosignals / Legolas.jl

Tables.jl-friendly mechanisms for constructing, reading, writing, and validating Arrow tables against extensible, versioned, user-specified schemas.
Other
24 stars 2 forks source link

`Legolas.Row` type aliases impede ergonomic deprecation cycles #49

Closed jrevels closed 1 year ago

jrevels commented 2 years ago

I've had this on the backburner to address for a long time, but am only actually confronting it now because there's an impending downstream breaking schema change I'd like to land soonish.

Part of the benefit of Legolas.jl is supposed to be that a single package version can actually contain/support definitions for multiple versions of the same schema simultaneously. However, the package also currently encourages users to define type aliases for Legolas.Row in a manner that I think ends up at odds with the aforementioned benefit.

Let's say I'm the author of a schema foo@1:

const Foo = @row("foo@1", x::Int)

Now let's say I want to introduce a breaking schema change, like forcing x > 0. One way to do this is:

const Foo = @row("foo@1", x::Int)

const Foo2 = @row("foo@2", x::Int = x > 0 ? x : error("bad"))

This is annoying as a package maintainer, because you are forced to eventually do one of the following as your package evolves:

  1. Don't ever deprecate Foo as the alias for foo@1 row construction. Introduce the alias Foo1 for Foo, and then always have Foo2, Foo3, etc. for each new schema version. Try to teach your users to always use an integer-appended alias at their callsites. In this case, you end up wishing you would've just started with Foo1 in the first place.

  2. Develop some deprecation strategy by which Foo always "eventually" means foo@<latest> (I use the term "eventually" because it might generally take multiple package version bumps to achieve a single schema version bump, depending on your approach). The simplest approach here is to just tie breaking schema changes to breaking package changes, but that feels like it defeats Legolas' purported ability to enable packages to seamlessly support multiple schema version simultaneously - an ability which is supposed to ease deprecation cycles, not complicate them. Maybe for some special classes of breaking changes it's possible to deprecate things cleanly/automatically, esp. if you're willing to overload Legolas.Row construction yourself (which probably should be generally discouraged. In general, though, this option seems even more annoying than option 1 to me.

I'd like a better option here, but I think we might have to change Legolas itself to achieve it.

One proposal: What if we took option 1's approach and baked it into Legolas from the get-go as the "supported best practice"? For example, we could drop @row(schema_name, fields...) in favor of @constructor(schema_name, type_name):

# current way
const MyRow = @row("my-schema@1", x::Int)

# proposed way
@schema("my-schema@1", x::Int)
@constructor("my-schema", MyRow)

In this example, @constructor would generate a definition like:

const MyRow{n} = Legolas.Row{Legolas.Schema{Symbol("my-schema"),n}}

...and we'd define stuff as necessary to ensure that:

IMO this would result in a world that is similar to Option 1 but more ergonomic/structured for everybody involved.

jrevels commented 2 years ago

I guess we should decouple the constructor binding from the schema definition, like so:

@constructor("foo", Foo) # takes schema name and intended type name
@schema("foo@1", #= fields... =#)
@schema("foo@2", #= fields... =#)
@schema("foo@2", #= fields... =#)

Otherwise we'd be leaving the door open for maintainers to do stuff like:

@schema("foo@1", Foo, #= fields... =#)
@schema("foo@2", Foo, #= fields... =#)
@schema("foo@3", NameThatIsNotFooLol, #= fields... =#)
@schema("foo@4", Foo, #= fields... =#)
@schema("bar@1", Foo, #= fields... =#) # basically breaks the stuff before it

Which would be wonky (though I don't see why a maintainer would ever actually do this)

jrevels commented 2 years ago

@kleinschmidt brought up a fun idea:

If we're going to do this, maybe we should go all the way and make the convenience constructor for the schema itself rather than the row type, and just expect folks to use Row when they want a Row.

So instead of:

Foo{3}(fields...)::Legolas.Row{Legolas.Schema{Symbol("foo"),3}}

we'd have:

Row(Foo{3}(), fields...)::Legolas.Row{Legolas.Schema{Symbol("foo"),3}}

I kinda like this since it's generally more useful to pass around Schema instances in real code vs. Row type aliases (ref https://github.com/beacon-biosignals/Legolas.jl/pull/29#discussion_r767237684)