Open rnayebi21 opened 2 months ago
To clarify: the result can be seen as a mixed-compacted archive because there is no 20240803 version row for time_value 20240801, despite the values XA and YA already being recorded for both signals. So there's a consistency question as to what a compact and uncompact epi-archive implies. Does compact imply fully compact? Does non-compact imply all possible versions being listed? Should the outcome of a compactify = FALSE merge return a fully uncompact archive?
Archive DT's currently always represent updates --- these are like our DB tables, where they could represent diffs or they could represent re-recording the same value for various reasons. We initially had this more explicitly highlighted via naming, e.g. updates_dt
or something like that. If the user wants to represent a removal, they currently have to manually put in a revision to NA and/or add a column tracking whether an associated signal value has been removed.
ryantibs(?) and I were discussing the possibility of making two different constructors: e.g., epi_archive_from_snapshots
[which would insert some sort of "deletion" updates where appropriate] vs. epi_archive_from_{diffs/updates}
, and we could maybe also similar extraction functions to sort of reverse these two operations. [Though we can't fully reverse compactification with the current format, since if there's a version completely identical to the last one, we omit any reference to it. This allows for a simpler archive format but might have motivated the perhaps-not-so-great version cadence inference stuff in epix_slide_ref_time_values_default
.]
So in the current design:
Was talking with @dshemetov about an example I came up with for
epix_merge()
see below:here with
COMPACTIFY = FALSE
, we were wondering if there should be another version for 08-03-2024, for time value 08-01-2024. Where both signals are null?However, if this was the case then for a given time value we would have a version for every date between the first version and last version, which could scale up very quickly. So I'm unsure if we would want to do this.