Open vparpoil opened 5 years ago
Is there any comment on this from the Amplify team? Or suggested steps for migrating DB information (are Data Pipeline or Custom CSV functions our only option?)
Migrations mechanism also could help with GSI updates issues .
Not sure if this helps anyone but I created a process for running migrations via an npm run command:
const common = require('./common.js'); const AWS = require('aws-sdk'); const migrations = [ // ensure migratons are in date order (oldest at the top) require('./migrations/20200201-lea-180'), require('./migrations/20200210-lea-184') ]; global.fetch = require('node-fetch');
/**
(async () => { AWS.config.update({region: 'eu-west-2'});
// if we heve no CI vars then use the local creds if (process.argv.length === 2) { AWS.config.credentials = new AWS.SharedIniFileCredentials({profile: 'PROFILE NAME'}); } else { // if CI then use env vars AWS.config.credentials = { accessKeyId: process.argv[ 2 ], secretAccessKey: process.argv[ 3 ] }; }
let dbConnection = new AWS.DynamoDB({apiVersion: '2012-08-10'}); try { // Make sure there is a migrations table console.log('Getting migration table'); let migrationTableName = await common.findTable(dbConnection, 'Migration-' + environmentName, null, true, true);
// If it doens't exist, create it
if (!migrationTableName) {
console.log('Migration table not found...creating');
migrationTableName = await createMigrationTable(dbConnection, 'Migration-' + environmentName);
console.log('Migration created');
}
// Get all migrations that have been ran
const previousMigrationsRaw = await common.getAllItems(dbConnection, migrationTableName);
const previousMigrations = previousMigrationsRaw.map((migration) => migration.migrationName.S);
const successfulMigrations = [];
let rollBack = false;
for (const migration of migrations) {
// Do I run the migration?
if (previousMigrations.some((m) => m === migration.name)) {
console.log('Already ran migration: ' + migration.name);
} else {
console.log('Running migration: ' + migration.name);
// Try to run migration
try {
await migration.up(dbConnection, environmentName);
successfulMigrations.unshift(migration);
console.log('Successfully ran: ', migration.name);
} catch (e) {
console.error('Up Error: ', migration.name, e);
console.error('Breaking out of migration loop');
// Push the failed migration so we can run the down
successfulMigrations.unshift(migration);
rollBack = true;
break;
}
}
}
// Was there an error? if so run all downs
if (rollBack) {
console.error('Attempting to revert ' + successfulMigrations.length + ' migrations');
for (const migration of successfulMigrations) {
console.error('Attempting to revert ' + migration.name);
try {
// Need to down all
await migration.down(dbConnection, environmentName);
} catch (e) {
console.error('Down Error: ', migration.name, e);
}
}
} else {
// Save migration completion
console.log('Saving migrations to server', successfulMigrations);
for (const migration of successfulMigrations) {
await common.putItem(dbConnection, migrationTableName, {
'migrationName': {
S: migration.name
},
'migrationDate': {
S: new Date().toISOString()
}
});
}
}
} catch (e) { throw (e); } })();
async function createMigrationTable (dbConnection, tableName) { var params = { AttributeDefinitions: [ { AttributeName: 'migrationName', AttributeType: 'S' }, { AttributeName: 'migrationDate', AttributeType: 'S' } ], KeySchema: [ { AttributeName: 'migrationName', KeyType: 'HASH' }, { AttributeName: 'migrationDate', KeyType: 'RANGE' } ], TableName: tableName, BillingMode: 'PAY_PER_REQUEST' };
// Call DynamoDB to create the table await dbConnection.createTable(params).promise(); return tableName; }
Not the cleanest code but now I just have a folder which contains js files that export a name and an up and a down function which talk to dynamoDB directly. as in the docs: https://docs.amazonaws.cn/en_us/amazondynamodb/latest/developerguide/GettingStarted.JavaScript.html
Really?? No comment on this? I don't understand how you're supposed to make any changes if you have an app in production, other than completely ejecting Amplify and managing your stacks et. al. completely yourself once you have live data and users in your app - which isn't a completely unreasonable idea, but I have not seen any mention of this being a purely development-stage only tool.
It's really a surprise that on amplify team member provides any useful information for this request. The feature is a MUST-HAVE feature for a data related solution.
It seems the data model evolution and data migration in amplify are completely forgotten.
I've switched to using Postgraphile w/ graphile-migrate for my backend, once you get the hang of writing your schema (playing around with graphile-starter helped a lot) it's really very nice. Forward-only migrations seem to be working well for me, and a real relational database means I can offload most of the work from the client to the server - a core premise of GraphQL is supposed to be eliminating client data processing, as it get's the data in exactly the format it wants. I still use Amplify to manage my Auth and S3, and for that purpose it works very well.
I have started to invest in the platform but an 18 month old issue like this, with no official comment, doesn't convince me that I would be able to manage a serious production application using amplify/appsync.
Not by any means a scalable/robust migration system for a team but fwiw I have been using an AWS::CloudFormation::CustomResource
with a a setupVersion and a setup lambda function.
"Version": {
"Ref": "setupVersion"
},
"ServiceToken": {
"Ref": "function..."
}
Then I've been making idempotent changes on version change via the lambda...works ok for dynamo/etc since you can't make substantial changes anyways but wouldn't be great for sql changes.
My approach has been the same as @cdunn. To elaborate a little, here are some more implementation details:
I have created a lambda called MigrationService. In the resources section of the template, I have the following custom resource:
"CustomMigrationService": {
"DependsOn": [
"AmplifyResourcesPolicy",
...
],
"Type": "Custom::MigrationService",
"Properties": {
"ServiceToken": {
"Fn::GetAtt": [
"LambdaFunction",
"Arn"
]
},
"TriggerVersion": 5
}
}
The most important thing in this custom resource is the TriggerVersion. If it is incremented, then the lambda will be executed upon deployment. So if you deployed with version 1, then made changes to your code and redeployed without incrementing the TriggerVersion, your lambda will not be executed.
Be sure to give the lambda the necessary access so it can make all the necessary migrations. I have done that by editing the AmplifyResourcesPolicy section and adding statements to the AmplifyResourcesPolicy > Properties > PolicyDocument > Statement. E.g.:
{
"Effect": "Allow",
"Action": [
"cognito-idp:AddCustomAttributes",
"cognito-idp:AdminAddUserToGroup",
"cognito-idp:ListUsers"
],
"Resource": [
{
"Fn::Join": [
"",
[
"arn:aws:cognito-idp:",
{
"Ref": "AWS::Region"
},
":",
{
"Ref": "AWS::AccountId"
},
":userpool/",
{
"Ref": "authcognitoUserPoolId"
}
]
]
}
]
},
or
{
"Effect": "Allow",
"Action": [
"dynamodb:Get*",
"dynamodb:BatchGetItem",
"dynamodb:List*",
"dynamodb:Describe*",
"dynamodb:Scan",
"dynamodb:Query",
"dynamodb:Update*",
"dynamodb:RestoreTable*"
],
"Resource": [
{
"Ref": "storageddbBlogArn"
},
{
"Fn::Join": [
"/",
[
{
"Ref": "storageddbBlogArn"
},
"index/*"
]
]
}
]
}
Next up, the handler of the lambda needs to account for the creation of the custom resource. Here's the skeleton of my code:
exports.handler = async (event) => {
const cfnCR = require('cfn-custom-resource');
const physicalResourceId = "physicalResourceId-MigrationService-112233"
const { sendSuccess, sendFailure } = cfnCR;
if (event.RequestType === "Delete") {
const result = await sendSuccess(physicalResourceId, {}, event);
return result;
}
try {
// your code here
const result = await sendSuccess(physicalResourceId, {}, event);
return result;
} catch (err) {
// your code here
const result = sendFailure(err, event);
return result;
}
};
Probably the most important thing here is to handle the Delete event. Your lambda will be executed if your stack is being rolled back so if your stack is rolling back because the lambda errored out when deploying then calling it again during rollback will end up hanging cloudformation.
Lastly, I've implemented versioning so I do not rerun migration scripts. (Keeping scripts idempotent and re-runnable is always a great idea however, it could get expensive if you have a long list of migration scripts so skipping the ones that have already executed comes in handy. If you have few re-runnable scripts you can potentially skip this.)
In my case, i have 3 environments so I store the latest deployed version number in a dynamodb table. When the lambda is triggered it will pull the latest deployed version number on that environment and will then load+run the migration scripts that have higher version.
My migration scripts folder structure is: migrationScripts | component | version.js
(I have separated the project into a few components that could be deployed independently but you might not need that)
It would have been nice if there was a built-in feature to help with the migration but the good news is that this approach works (given adequate access) for any AWS resource change and not only data.
@dabit3 any official statement on this? Is amplify a dev tool only? Please make it clear in the docs that amplify is not suitable for production apps. Many people spend a lot of time on this only to find out that most basic features are missing. Plus, no official statement for more than a year 👎
Bumping this
Yeah, this is critical.
Yeah, I've been searching everywhere for an understandable way to do this.
we're also having an issue with this... any direction in the official docs would be appreciated
I would really like to understand what the Amplify team's recommendation is on this... what are best practices, etc.
@dabit3 any official statement on this? Is amplify a dev tool only? Please make it clear in the docs that amplify is not suitable for production apps. Many people spend a lot of time on this only to find out that most basic features are missing. Plus, no official statement for more than a year 👎
Totally agree with you. It's easy to set up projects from scratch. But in long term, when there's a need to changes, we eventually are in hell. Amplify hide very much implementation details, so it's lack of production-grade features.
Glad I ran into this early in my evaluation. It'd have been catastrophic to hit a wall like this in production.
This is also an issue for me. One key requirement is to have rollback support. Our dev team uses multiple independent environments and we often push other branches during code reviews, then push another branch, effectively removing previously added resources.
No response to this for so long really sucks. @josefaidt I see you've added this to your project board recently... perhaps a quick reply to at least give us some info would be nice?
Hey - wanted to drop a note in from the Amplify team. We're looking into some data / schema migration workflows right now, though because this space is really large, we won't address every single use case initially. Soon, we'll launch a mechanism to explicitly opt in to breaking changes during push. After that we'll look into more sophisticated migration workflows.
Question to the community, is this feature valuable already if we enforce to only allow data migrations if the schemas between the environments are exactly the same?
One of our core design challenges right now is to provide a smooth migration experience when it's not so obvious. For example, renamed models or fields, changed field types and nullability all within one "deployment step".
I wonder if there could be some schema markup to help with this...where you make use of temporary @was, or @isNow
type Dog @model { id: ID! name: String! breed: String! favoriteToy: String! }
type Animal @model @was('Dog'){ id: ID! name: String! type: String! isNow("Dog") breed: String! favoriteObject: String! @was("favoriteToy") }
@isNow basically fills in the field with a value...maybe could be hooked up to a lambda, or simple logic @was basically moves renames the object
Both of these would only work when the field didn't exist...so the migration only happens the first time it is encountered...and after all environments are migrated, you can safely remove them...
Hey - wanted to drop a note in from the Amplify team. We're looking into some data / schema migration workflows right now, though because this space is really large, we won't address every single use case initially. Soon, we'll launch a mechanism to explicitly opt in to breaking changes during push. After that we'll look into more sophisticated migration workflows.
Question to the community, is this feature valuable already if we enforce to only allow data migrations if the schemas between the environments are exactly the same?
One of our core design challenges right now is to provide a smooth migration experience when it's not so obvious. For example, renamed models or fields, changed field types and nullability all within one "deployment step".
One of my use cases is that I need to make a change to the schema that involves a breaking change to the data that is already in the schema. For instance, a field that was previously not required becomes required and we need to backfill some data into existing records in order for AppSync to not complain.
What I am looking for is the capability to execute a series of migration scripts during or after the amplify deployment, where the scripts have an 'up' and a 'down' capability in case of rollback. The ideal solution would keep track of which scripts have been executed, execute the 'up' method during migration events, and have some way of rolling back a migration and triggering the 'down' event in the event that the deploy fails for some reason.
Ideally Amplify would provide the infrastructure and scaffolding for this and all I would need to do would be to run an amplify
command to create a new migration script and then fill in the details of the up
and down
.
@renebrandel This is also related to aws-amplify/amplify-category-api#180
On top the actual implementation that you guys might undergo (hope so), given that lots of people are implementing their own custom approach, I think it would be also very useful to provide guidance and feedback on what is the best route to implement a custom approach here.
Some tricky aspects upon schema update / migration:
I'll share some of our notes, it's just a draft:
In general, what about adding an entry in the amplify docs about data migration, mentioning the plans for implementation and alternative best practices for custom approaches?
Please keep us posted about your implementation schedule.
@renebrandel is this being worked on in some form or fashion still? If so could you possibly link a branch?
@renebrandel
Any update on this? Or can someone lead me to a best practices guide on this issue? I can't find anything in the docs and I often run into issues after Schema updates (Simply creating a non nullable field which doesn't currently exist within a db table).
I'm hoping to start testing my application with live users and I'm certain migrations are necessary for that.
Am I going to have to write my own custom migration mechanism or has the team got something in the works?
Hi @Taylor-S For your particular use case, you should be able to use the @default
directive in your new field. https://docs.amplify.aws/cli/graphql/data-modeling/#assign-default-values-for-fields
But migration use case are obviously much larger than just that. We're currently working on a @mapsTo
directive that allows you to rename an existing field/model to a new name.
@renebrandel , Awesome! That directive will definitely help me out. Obviously I'm very new to graphql and amplify. :) Glad to hear the team has something in the works. I'll keep an eye out for the update. Thanks for the quick reply
Hi everyone, while we haven't yet addressed all of the concerns mentioned in this thread, we are excited to announce a new @mapsTo
directive to help with certain scenarios. It is available in the latest version of the CLI (7.6.14) as a developer preview. To try it out, you don't need to do anything except start using the directive in your schema.
This directive can be used to rename a GraphQL type but retain the original table and data. Usage looks like:
type Article @model @mapsTo(name: "Blog") {
id: ID!
title: String!
}
Where "Blog" is the original name of the "Article" type that contains data you want to retain. For more details, check out the docs PR here: https://github.com/aws-amplify/docs/pull/3890/files
Hi everyone, while we haven't yet addressed all of the concerns mentioned in this thread, we are excited to announce a new
@mapsTo
directive to help with certain scenarios. It is available in the latest version of the CLI (7.6.14) as a developer preview. To try it out, you don't need to do anything except start using the directive in your schema.This directive can be used to rename a GraphQL type but retain the original table and data. Usage looks like:
type Article @model @mapsTo(name: "Blog") { id: ID! title: String! }
Where "Blog" is the original name of the "Article" type that contains data you want to retain. For more details, check out the docs PR here: https://github.com/aws-amplify/docs/pull/3890/files
That's not a solution when you need to populate your tables and then start working on the app.
Any updates on it?
@renebrandel, @alphonse92 - have you considered some sort of bulk import/export tool for data migrations?
I understand the difficulty in developing a mature, universal migration framework, however.....
If we could extract large swaths of data from dynamo, say into an RDBS, we could:
amplify push
to execute a destructive schema update.This is definitely the old school way to process a migration, but it can be used in 100% of cases. Would surely plug the gap in Amplify's migration capabilities in a hurry.
I found this while looking for a solution to this problem and it works well.
https://github.com/technogise/dynamo-data-migrations
It is a cli for creating and managing migrations. I has up
and down
commands to execute migrations and rollback and it it keeps track of what has been applied and the migration order.
There are 2 caveats for this tool:
user:Stack
tagsup
and down
migration function.5 Years still no updates? Bumping this. Please have a look at this!
Is your feature request related to a problem? Please describe. When developing our app, we use 3 environments : dev, preproduction, production. Often there is : • a need to alter the schemas to add required fields => after push, existing data have this field set to null • a need to add a new data schema that should be populated at first (ie app parameters) => after push the dynamoDB table is empty
It seems to be missing a feature in amplify cli to migrate the databases so we can achieve seamless push in new environments
Describe the solution you'd like It would be great to have the ability to describe migrations of data in the amplify folder so that the migrations are executed upon push
Describe alternatives you've considered • Using dynamoDB interface to input the data by hand => difficult if there is a lot of data • Using a custom external script to trigger mutations with the data needed to modify or input => sometimes you want to disable mutations on this particular schema (i.e. for a list of Countries) so you cannot do this easily. This also requires more boilerplate code. • using a custom script with aws js sdk => seems the way to go for now
Additional context Some great things exist with other frameworks, I will only link there some I have used : for meteor, for laravel I think version numbering is a must have for such functionality
If you have other alternatives, please comment here, I would be happy to test other solutions