Right now bundle just versions "regular" data, actual rows in normal tables. Tracking schema rows (meta.table, meta.column, meta.schema, meta.function, meta.type, etc.) is the next frontier. Past experiments have have shown it's possible to at least commit meta rows and then check them out, but certainly not with any level of robustness or fault tolerance.
Making this change necessitates some clarifications of the information model:
New Columns
In the current model, a commit is a set of:
rows added
rows deleted
changed fields
When a row is added, it gets a rowset_row_field for every column in the row's table, basically whatever is present in the db at the time.
Now, let's say the user does a alter table public.x add column y. What happens?
a) Untracked Meta Row
If meta.column is being tracked in bundle.trackable_nontable_relation, (currently a global, which is wrong, #287), then a new meta.column row will appear in untracked_rows, with row_id (meta, column, id, meta.column_id('public','x','y')).
Same with a column's other potential rows in meta, meta.constraint_unique, meta.constraint_check, meta.foreign_key, etc.
Constraints will have a pg_depend-level dependency on the column (I believe) and the column will have a dependency on its table.
b) New Fields on Old Rows
After column creation, a new field appears on each row in the table. What are these?? Are they "untracked_fields"? (Not a thing). Are they just another field change, from void to existence? It is a different category of change, that does not fit into the above model.
On commit, I expected that fields in new columns would just propagate into rowset_row_field, but that is not what is happening. The stage_row_field view just pulls fields from the previous commit, and doesn't detect new columns on old rows. It does however use meta.relation_column for new rows, so those rows do get a field for the new column.
stage_row_field is supposed to contain what rowset_row_field would look like if a commit was done right now. Commit actually just pulls from this view to create the next commit. But when a column is added, what should the next commit look like?
if this bundle is not tracking its own schema, pretty clearly the field should just included for each row in the next commit
if it is tracking its own schema, if the column is created but not staged, its fields should not be included?
c) Field-level Versioning vs Row-level Versioning
This distinction is at the heart of all the questions above.
It's tempting to try to pursue "field-level" tracking, and add a new dimension to the information model for "untracked_field" and "field_added". I don't believe this is impossible, but it's two steps (leaps) forward. Ideally, multiple bundles could version in an extremely granular way, version control a single field, or a single column. One bundle defines a table and some columns, but another bundle adds a column to that table and tracks its fields. This is beyond what even git does AFAICT; the git equivalent would be multiple git repositories operating in the same directory, version controlling different files simultaneously. This is the Semantic Web model, it's all just triples, a commit is a collection of meta.field_ids, period.
The simpler path is to, for this next step at least, just say that a commit saves whatever fields are present on the row at commit time. In head_db_stage, we can at least detect and display that a row has new fields, or fields deleted.
It's going to be important to keep an eye on the distinction between a field whose value has been set to NULL, and a field whose column has been deleted.
Some new functions get_commit_fields(commit_id), get_commit_rows(commit_id), get_db_fields(commit_id) and get_db_rows(commit_id) are much more axiomatic and precise (and fast) functions for addressing this space. The get_commit_* functions return exactly which rows and fields were in the previous commit, disregarding the working copy entirely. get_db_rows() returns rows from the supplied commit, along with whether or not each row exists in the db. get_db_fields() returns all fields present in the database on the supplied commit's rows; rows from the supplied commit that have been deleted are not present.
These functions are the foundation for refactoring the entire space with an eye on schema changes.
Column Renames
In terms of meta, they're a deleted row and an added row in meta.column. Maybe we can do something smarter. One cool idea is to detect them by seeing if the oid is the same.
Deleted Columns
Again, if we're doing field-level versioning, this is a new category of change, a field_delete. If not, just include whatever columns are present in the db at commit-time, in which case this is easy.
Right now bundle just versions "regular" data, actual rows in normal tables. Tracking schema rows (
meta.table
,meta.column
,meta.schema
,meta.function
,meta.type
, etc.) is the next frontier. Past experiments have have shown it's possible to at least commit meta rows and then check them out, but certainly not with any level of robustness or fault tolerance.Making this change necessitates some clarifications of the information model:
New Columns
In the current model, a commit is a set of:
When a row is added, it gets a
rowset_row_field
for every column in the row's table, basically whatever is present in the db at the time.Now, let's say the user does a
alter table public.x add column y
. What happens?a) Untracked Meta Row
meta.column
is being tracked inbundle.trackable_nontable_relation
, (currently a global, which is wrong, #287), then a new meta.column row will appear inuntracked_rows
, with row_id (meta, column, id, meta.column_id('public','x','y')).meta.constraint_unique
,meta.constraint_check
,meta.foreign_key
, etc.Constraints will have a pg_depend-level dependency on the column (I believe) and the column will have a dependency on its table.
b) New Fields on Old Rows
After column creation, a new field appears on each row in the table. What are these?? Are they "untracked_fields"? (Not a thing). Are they just another field change, from void to existence? It is a different category of change, that does not fit into the above model.
meta.relation_column
for new rows, so those rows do get a field for the new column.stage_row_field
is supposed to contain whatrowset_row_field
would look like if a commit was done right now. Commit actually just pulls from this view to create the next commit. But when a column is added, what should the next commit look like?c) Field-level Versioning vs Row-level Versioning
This distinction is at the heart of all the questions above.
It's tempting to try to pursue "field-level" tracking, and add a new dimension to the information model for "untracked_field" and "field_added". I don't believe this is impossible, but it's two steps (leaps) forward. Ideally, multiple bundles could version in an extremely granular way, version control a single field, or a single column. One bundle defines a table and some columns, but another bundle adds a column to that table and tracks its fields. This is beyond what even git does AFAICT; the git equivalent would be multiple git repositories operating in the same directory, version controlling different files simultaneously. This is the Semantic Web model, it's all just triples, a commit is a collection of
meta.field_id
s, period.The simpler path is to, for this next step at least, just say that a commit saves whatever fields are present on the row at commit time. In head_db_stage, we can at least detect and display that a row has new fields, or fields deleted.
It's going to be important to keep an eye on the distinction between a field whose value has been set to NULL, and a field whose column has been deleted.
Some new functions get_commit_fields(commit_id), get_commit_rows(commit_id), get_db_fields(commit_id) and get_db_rows(commit_id) are much more axiomatic and precise (and fast) functions for addressing this space. The
get_commit_*
functions return exactly which rows and fields were in the previous commit, disregarding the working copy entirely.get_db_rows()
returns rows from the supplied commit, along with whether or not each row exists in the db.get_db_fields()
returns all fields present in the database on the supplied commit's rows; rows from the supplied commit that have been deleted are not present.These functions are the foundation for refactoring the entire space with an eye on schema changes.
Column Renames
In terms of meta, they're a deleted row and an added row in meta.column. Maybe we can do something smarter. One cool idea is to detect them by seeing if the
oid
is the same.Deleted Columns
Again, if we're doing field-level versioning, this is a new category of change, a
field_delete
. If not, just include whatever columns are present in the db at commit-time, in which case this is easy.Object Dependencies
See #283.