Open nelsonic opened 7 years ago
@des-des could this be useful:
Ok this exists for sql server. It finds the difference between two dbs, and builds a migration script. This is the kind of thing we are interested in but for postgresql. https://opendbdiff.codeplex.com/
Ok, This is some js diffing two dbs: https://github.com/gimenete/dbdiff/blob/master/dbdiff.js.
Works for postgresql!
The title here is postgres-schema-migration-checker. To me this suggests that we are building a ci tool to check if a schema migration will work.
Ie given a migration script, can we apply it to the database without breaking anything. This seems different to my impression of our discussion but makes more sense.
Given this outlook, I possible steps as.
Attempt to apply the migration, if it works we are okay to apply the migration to production.
To outline what I think @nelsonic was suggesting (the whole migration process).
In this process, I am not sure if we can depend on reliability of transactions happening during the copy in step 8.
Oxford abstracts have many migration scripts. Each one is run on server startup. (I assume this will happen when new code is deployed) These migration scripts need to work even if they have been run before, as they get applied many times to a production db.
As the number of migration scripts grow, this process is becoming hard to manage. The db schema become more confused, as to understand the current schema the effect of many migration scripts must be taken into account.
I think another problem here is testing time. Buillding a test db is taking a long time, as all the migration scripts need to be applied.
@Conorc1000 @roryc89 Do you feel my description of your problem is accurate? Is there anything you can add.
@des-des I think this is a good description of the problem. (Yes we currently run our migrations each time we deploy to heroku).
@des-des I agree that this an accurate description of our most pressing schema migration issues. Thanks for helping out! :)
@nelsonic It seems to me that normally you would apply a migration schema to a db. Rather than build a script that moved data between two dbs with different schema, I think this might be confusing me.
This is also related to my previous question: During the actual deployment step, how can we safely keep the client / server live while the copy is in progress? eg If we send a POST to both live db and db being created can we be sure that the data will not get added twice.
@iteles Do you have the photos?
@des-des So sorry, I remember uploading them and checking the preview to make sure they were there but must have forgotten to hit 'Update comment' after that 😭
I've updated the top comment with them now. Apologies again.
Hi @des-des yes, "normally" a migration would be applied to the DB. What we need to know is if the migration schema will "break" any existing data in the Database.
More "advanced" or "mature" schema migrators like Active Record will attempt to do this for you.
We need a similar process for our hand-written SQL table definitions.
It could be that there is already a tool out there that looks at a two versions a database
with slightly different schemas e.g. changing a date
field's type from varchar
to date
which then Tests if the existing data will be converted without corruption/loss.
Our idea was to investigate the <option>
of using an existing Schema Migration script/tool
or if we don't find one to write one that does exactly what we need.
If we need to clarify the requirements further, I'm happy to do so. I agree that knowing exactly what we are trying to achieve before we set out on the journey will save huge amounts of time and frustration, so thanks for asking questions and please keep them coming! 👍
Also, to be clear if we can make this project/work reasonably generic, i.e. it can be used to test if data in any existing Postgres database will be affected by a schema migration (change in the structure of tables). we will be able to re-use this for all/any project that has a PoSQL DB. that will include many of our existing projects and several future ones. and crucially there are tens of thousands of projects where this would be useful. Because Postgres is not "going away" any time soon... https://github.com/dwyl/how-to-choose-a-database/issues/4
@nelsonic awesome. That makes sense! Will try to put something together soon!
Ok, so I see there being three parts here
This is the focus of most of the tools @nelsonic has linked to. The main idea here is to ensure migration scripts are only applied once. The target db will have a table of previously applied schema. @Conorc1000 @roryc89 This provides a partial solution to your problem. Deploying a tool similar to these would help you, since your migration scripts would not need to guard against being doubly applied. You could have confidence, that for any db the migrations would be applied only once and in order.
Given a pull request with a change of schema, can we automate the creation of a migration script? Yes, we can use a tool like dbdiff. Q: What is the ux when this fails? Q: This tool does not provide a rollback, is this a problem? Q: how can we insure that
GetSchemaFromLiveDb(ApplyMigration(dbN-1, createMigrationFromDiff(schemaN-1, schemaN))
and
GetSchemaFromLiveDb(createDbFromSchema(schemaN))
are the same Where are the migration schema kept? Does the ci tool commit them to the repo? I think we just run the creation tool locally
Too me, although it may be tangential to the problem at hand, this project needs a way of being confident that migrations are not losing data before it leaves beta.
This is the problem Nelson's drawings are describing. We can break this problem into two parts
@des-des yes, this would be a good approach. 👍
@nelsonic could you be more specific?
@des-des the steps you have described don't appear logical to me. The UX when the migration step fails is: exit with a "non-zero" terminal output consisting of the reason why the script failed.
Migration schemas would be automatically created on the developer's machine by the script. i.e. on the localhost. and then preferably they should be included in the commit/PR so that we can version-control them.
As for having access to production data, we can simulate valid records for both schemas based on the data types for the columns. Then it should be straightforward to attempt to insert a row that is valid for one schema into the same table(s) in the revised schema and check if it worked. 👍
Sorry to have left this so long. FAC1N happened and kinda took over for a couple of weeks then updating back here got lost in my todo list.
About a month ago I spoke to the OA team, I'll quickly summarise the outcome of that conversation. OA have actually gone ahead and solved a part of their problem. Their DB now holds a table of run migrations. This means that for any instance of their db, they can make sure migrations are run once in order.
Anyway, I think we need to step back a little and think about his problem in the context of dwyl's new stack..
@roryc89 @Conorc1000 @naazy Is this still required for your project?
Postgres is still part of our stack, so this functionality may still be required.
@iteles
@roryc89 @Conorc1000 @naazy and if the answer is yes, can you be explicit about what is needed?
@iteles and @des-des. As our main DB pain point, the running of migrations multiple times, has been addressed, I don't think we require a data migrations tool in our project at the moment. Although we may need it in the future
@roryc89 thanks for replying and confirming. 👍 if that's the case and given that we are moving most of our other projects to "PETE" which has really good schema migrations, should we put this project on the "back burner" for now? 💭
Is possible to write python script for production db data migrate to staging db for postgresql??
@khairnarTK it's definitely possible, Django has migrations: https://docs.djangoproject.com/en/2.2/topics/migrations/
@iteles Please add the photos you took of the sketches from today into this description of the issue (or send them to me!) thanks! ⭐️