goldmansachs / obevo

Obevo is a database deployment tool that handles enterprise scale schemas and complexity
Apache License 2.0
236 stars 56 forks source link

Dependant Object Recreation #162

Open IvanAtanasov opened 6 years ago

IvanAtanasov commented 6 years ago

Hi,

We are trialling the use of Obevo and wondering how it handles changes to tables that have dependant objects, such as foreign keys or views. From some experimentation I can't make it work in a way that would be suitable.

e.g.

my_view1.sql

CREATE VIEW my_view AS SELECT col1 FROM common.my_table
GO

my_table.sql

//// CHANGE name=change1
CREATE TABLE my_table (
  col1  INT2 NOT NULL,
  col2 INT2 NOT NULL
)
GO

//// CHANGE name=change2
ALTER TABLE my_table DROP COLUMN col2
GO

I would expect this to manage dropping and recreating my_view but instead the deployment fails.

I can see a similar problem existing with foreign keys too. What is the suggested way of dealing with changes like this?

Thanks, Ivan

shantstepanian commented 6 years ago

Hi Ivan, As of today, we do not have a hook that will retrigger the view creation via a table change, though it could be possible to add in the future.

But to prioritize on our side, I'd like to get more info on your use case. What DBMS type are you dealing with, and what is the exact error that you are getting?

Fyi, we have a couple ways to work around similar issues, though they are not directly applicable to your use case (only for DB2):

In addition, not all DBMS types have this issue; and in practice, a table change could require changing the view definition anyway (e.g. adding a new table column and exposing it in the view if you were not using "select *"), so you may ultimately not need to deal with this.

IvanAtanasov commented 6 years ago

Thanks for getting back to me @shantstepanian

I am trying to do this using a Redshift database. In redshift dropping any column in a table that is referenced within a view will fail because of the existence of dependant objects. You can add 'CASCADE' to make it succeed but this results in the views being dropped. Note that this is any column and not only columns in the view, therefore there will be cases where we want to remove a column and leave the view as is.

Another question I have is, what is the solution for times when a drop and recreate is required for a table that had foreign keys linking to it?

Commonly we do this to change column encodings or change column types but persist the original column order. Redshift alter statements are fairly limited so things like this require creating the table again. I suppose we could add the foreign keys back through other changes but this doesn't seem a nice way of handling it, also there would be duplicate foreign key definitions afterwards.

Thanks, Ivan

shantstepanian commented 6 years ago

Thanks for the details on the table/view use case. I will play around with this in Redshift this week and get back to you then with an estimate on when we can support this. I agree that using CASCADE on its own won't be ideal unless we have a way to add the columns back.

Your foreign key use case would end up being similar to the table/view use case, except that we don't currently have a way to automatically replay foreign keys in the way that you suggest. We do have the FK toggle in the //// CHANGE section to define foreign keys, which we currently use for our static data functionality, but also is useful in case a use case like yours came along. Implementing this would come after implementing this for views.

So to recap your foreign key use case:

//// CHANGE RECREATE name=mychange
CREATE TABLE TableC ...

To work around this w/ the current tooling - you are correct in that you'd need to add this via other changes and it is not ideal. We have the same qualm with table indexes - both those are modeled as incremental changes. Note that we handle triggers in a "rerunnable" fashion as they are separately maintained. I will give this requirement a think as well to see how we can maintain this.

IvanAtanasov commented 6 years ago

@shantstepanian Thanks for looking into this!

The use case you described is exactly what I'm thinking. So the proposed solution would be: To add some flag that marks a change as requiring dependencies to be recreated. This could apply to recreating a table or dropping a column. The drop changes would have to include a 'CASCADE' in order to drop all the dependant objects, ready for them to be created again by Obevo.

Something like that sounds like it would work for us.

shantstepanian commented 6 years ago

Hi Ivan, I took a look into the Redshift use case for handling recompilation of views.

I believe you can work around this by using late binding views, i.e. creating the view using "WITH NO SCHEMA BINDING". I saw it mentioned in the drop syntax and the create syntax. I don't know the full practical implementation; there would be a small delay on the first access of the view after changing the table, but I assume subsequent queries on the view should be fine (I only have a small sample size to test with).

You can try out the following example:

-- set the schema
SET search_path TO dbdeploy01;

-- drop the objects to prep the test
drop view my_view;
drop view my_view_star;
drop view my_view_col2;
drop table my_table;

-- create and populate the table
CREATE TABLE my_table ( col1  INT2 NOT NULL, col2 INT2 NOT NULL );
insert into my_table values (1, 2);

-- create the view w/ explicit col1 references
CREATE VIEW my_view AS SELECT col1 FROM dbdeploy01.my_table with no schema binding;
-- create the view w/ select *
CREATE VIEW my_view_star AS SELECT * FROM dbdeploy01.my_table with no schema binding;
-- create the view referring col2 as well
CREATE VIEW my_view_col2 AS SELECT col1, col2 FROM dbdeploy01.my_table with no schema binding;

-- note that the view creation needs explicit mention of the schema, as indicated in Redshift docs

-- all selects should work fine
SELECT * FROM my_view;
SELECT * FROM my_view_star;  -- shows col1 and col2
SELECT * FROM my_view_col2;

-- now drop the column
ALTER TABLE my_table DROP COLUMN col2;

-- this select works fine, only shows col1
SELECT * FROM my_view;
-- this select works fine, only shows col1
SELECT * FROM my_view_star;
-- this will fail
SELECT * FROM my_view_col2;

When it comes to maintaining your view file, the question is how best to represent the physical schema, esp. if your physical schema changes across environments. In that case, you can use the built-in ${<logicalSchema>_schemaSuffixed} token for your schema, as described here. Example below:

CREATE VIEW my_view AS
SELECT col1 FROM ${myschema_schemaSuffixed}my_table
with no schema binding;

Note that the token includes the period suffix at the end, e.g. it translates to "dbdeploy01."

Given that workaround for the views, between the two use cases you raised: A) handling view recreation after changing a table B) handling recreation of a table w/ foreign key dependencies, I would prioritize B) (the foreign key use case) on my end, as it doesn't have a nice workaround, esp. compared to A) for the views.

Let me know what you think.

shantstepanian commented 6 years ago

Hi @IvanAtanasov , fyi - we will start work on the foreign key use case this week. Hopefully will have something for you to try by end of this month

shantstepanian commented 6 years ago

Update - we need to do some refactoring to be able to make this change. The refactoring has taken longer than anticipated. Work is still in progress; will send update by end of September

shantstepanian commented 4 years ago

Additional use case - rule creation and bindings in Sybase

shantstepanian commented 4 years ago

The pre-requisite refactoring has been done on this use case, but now the main development is needed

Things to consider for this implementation: