Closed ericphanson closed 1 year ago
first off, I agree with refactoring this so that Legolas' implementated behavior matches this PR's title
some comments just to clarify some other points
I think users should not be relying on column order in general (since that's much more fragile than using the column names)
Even moreso, this is officially/intentionally the case, though maybe should be documented better in more places; a set of columns does not need to be in any particular order to be considered to be compliant with a given Legolas schema. The most formal definition for schema compliance is here, I guess
ad-hoc schema extensions to formal schema extensions without rewriting existing tabular data
(I think I understand what you're getting at here but it's possible I could be missing your point, if so apologies 😁 )
Since Legolas schema fields are unordered, you shouldn't necessarily "need to" rewrite existing tabular data. Of course, in practice, since they always happen to get shuffled around as an implementation detail via Legolas.Row
, it means it's more annoying to efficiently implement/utilize ad hoc extensions
and actually I think we should actually "undocument" this and explicitly render it an implementation detail: https://github.com/beacon-biosignals/Legolas.jl/blob/3f0a29b3b166a504368eb39f4ba31587f7493c08/examples/tour.jl#L85
Ah ok, thanks for clarifying! I did not realize the column-ordering was completely an implementation detail. In that case this issue is definitely not urgent (I had thought it was breaking and important to get in before usage was widespread).
In e.g. https://github.com/beacon-biosignals/Legolas.jl/blob/9b0839c085a76a0256a1f690a166f5c91b157946/test/runtests.jl#L103-L105
I think the order of columns would be more preferable as
x
,y
,z
. Why? Since where thex
column ends up is different for different extensions of the parent schema which looks strange to me. I think users should not be relying on column order in general (since that's much more fragile than using the column names), but things likeshow
methods definitely should use the column order, and I think it makes extension schemas feel fairly different than just adding extra custom columns, whereas they should instead be a formalization of adding extra custom columns.E.g. Onda annotations have a
recording
,id
, andspan
column. If we ad-hoc added avalue
column, we'd get(`recording`, `id`, `span`, `value`)
. If we ended up using this a lot and decided to formalize it into avalue-annotations
schema or something, then it will get moved to the front,(`value`, `recording`, `id`, `span`)
, which I think is non-optimal.This also matters if we want to promote ad-hoc schema extensions to formal schema extensions without rewriting existing tabular data, just updating the metadata.