A way to migrate data when updating app

vparpoil commented 5 years ago

Is your feature request related to a problem? Please describe. When developing our app, we use 3 environments : dev, preproduction, production. Often there is : • a need to alter the schemas to add required fields => after push, existing data have this field set to null • a need to add a new data schema that should be populated at first (ie app parameters) => after push the dynamoDB table is empty

It seems to be missing a feature in amplify cli to migrate the databases so we can achieve seamless push in new environments

Describe the solution you'd like It would be great to have the ability to describe migrations of data in the amplify folder so that the migrations are executed upon push

Describe alternatives you've considered • Using dynamoDB interface to input the data by hand => difficult if there is a lot of data • Using a custom external script to trigger mutations with the data needed to modify or input => sometimes you want to disable mutations on this particular schema (i.e. for a list of Countries) so you cannot do this easily. This also requires more boilerplate code. • using a custom script with aws js sdk => seems the way to go for now

Additional context Some great things exist with other frameworks, I will only link there some I have used : for meteor, for laravel I think version numbering is a must have for such functionality

If you have other alternatives, please comment here, I would be happy to test other solutions

blazinaj commented 5 years ago

Is there any comment on this from the Amplify team? Or suggested steps for migrating DB information (are Data Pipeline or Custom CSV functions our only option?)

Ravirael commented 4 years ago

Migrations mechanism also could help with GSI updates issues .

ghost commented 4 years ago

Not sure if this helps anyone but I created a process for running migrations via an npm run command:

const common = require('./common.js'); const AWS = require('aws-sdk'); const migrations = [ // ensure migratons are in date order (oldest at the top) require('./migrations/20200201-lea-180'), require('./migrations/20200210-lea-184') ]; global.fetch = require('node-fetch');

/**

This file is used to data migrations and only data migrations. Schema changes are handled by amplify.
In order to run a migration:
1. Add the file into the migrations folder (copy the template)
1. Require the reference to the BOTTOM of the migrations array above
Best practice, Make no changes to schema that are going to cause backwards compatibility issues.
e.g no deleting columns/tables
Yes, I realise this will create technical debt with rouge unused colums everywere but Amplify is changing the schema itself
We can run clean up at a later date when we know the data that you are migrating has been changed.
NOTE: The schema only changes in Appsync not dynamoDB itself, do not expect new columns to appear. */ const environmentName = common.getCurrentEnv();

(async () => { AWS.config.update({region: 'eu-west-2'});

// if we heve no CI vars then use the local creds if (process.argv.length === 2) { AWS.config.credentials = new AWS.SharedIniFileCredentials({profile: 'PROFILE NAME'}); } else { // if CI then use env vars AWS.config.credentials = { accessKeyId: process.argv[ 2 ], secretAccessKey: process.argv[ 3 ] }; }

let dbConnection = new AWS.DynamoDB({apiVersion: '2012-08-10'}); try { // Make sure there is a migrations table console.log('Getting migration table'); let migrationTableName = await common.findTable(dbConnection, 'Migration-' + environmentName, null, true, true);

// If it doens't exist, create it
if (!migrationTableName) {
  console.log('Migration table not found...creating');
  migrationTableName = await createMigrationTable(dbConnection, 'Migration-' + environmentName);
  console.log('Migration created');
}

// Get all migrations that have been ran
const previousMigrationsRaw = await common.getAllItems(dbConnection, migrationTableName);
const previousMigrations = previousMigrationsRaw.map((migration) => migration.migrationName.S);
const successfulMigrations = [];
let rollBack = false;

for (const migration of migrations) {
  // Do I run the migration?
  if (previousMigrations.some((m) => m === migration.name)) {
    console.log('Already ran migration: ' + migration.name);
  } else {
    console.log('Running migration: ' + migration.name);

    // Try to run migration
    try {
      await migration.up(dbConnection, environmentName);
      successfulMigrations.unshift(migration);
      console.log('Successfully ran: ', migration.name);
    } catch (e) {
      console.error('Up Error: ', migration.name, e);
      console.error('Breaking out of migration loop');
      // Push the failed migration so we can run the down
      successfulMigrations.unshift(migration);
      rollBack = true;
      break;
    }
  }
}

// Was there an error? if so run all downs
if (rollBack) {
  console.error('Attempting to revert ' + successfulMigrations.length + ' migrations');
  for (const migration of successfulMigrations) {
    console.error('Attempting to revert ' + migration.name);
    try {
      // Need to down all
      await migration.down(dbConnection, environmentName);
    } catch (e) {
      console.error('Down Error: ', migration.name, e);
    }
  }
} else {
  // Save migration completion
  console.log('Saving migrations to server', successfulMigrations);
  for (const migration of successfulMigrations) {
    await common.putItem(dbConnection, migrationTableName, {
      'migrationName': {
        S: migration.name
      },
      'migrationDate': {
        S: new Date().toISOString()
      }
    });
  }
}

} catch (e) { throw (e); } })();

async function createMigrationTable (dbConnection, tableName) { var params = { AttributeDefinitions: [ { AttributeName: 'migrationName', AttributeType: 'S' }, { AttributeName: 'migrationDate', AttributeType: 'S' } ], KeySchema: [ { AttributeName: 'migrationName', KeyType: 'HASH' }, { AttributeName: 'migrationDate', KeyType: 'RANGE' } ], TableName: tableName, BillingMode: 'PAY_PER_REQUEST' };

// Call DynamoDB to create the table await dbConnection.createTable(params).promise(); return tableName; }

Not the cleanest code but now I just have a folder which contains js files that export a name and an up and a down function which talk to dynamoDB directly. as in the docs: https://docs.amazonaws.cn/en_us/amazondynamodb/latest/developerguide/GettingStarted.JavaScript.html

lukeramsden commented 4 years ago

Really?? No comment on this? I don't understand how you're supposed to make any changes if you have an app in production, other than completely ejecting Amplify and managing your stacks et. al. completely yourself once you have live data and users in your app - which isn't a completely unreasonable idea, but I have not seen any mention of this being a purely development-stage only tool.

ivenxu commented 4 years ago

It's really a surprise that on amplify team member provides any useful information for this request. The feature is a MUST-HAVE feature for a data related solution.

It seems the data model evolution and data migration in amplify are completely forgotten.

lukeramsden commented 4 years ago

I've switched to using Postgraphile w/ graphile-migrate for my backend, once you get the hang of writing your schema (playing around with graphile-starter helped a lot) it's really very nice. Forward-only migrations seem to be working well for me, and a real relational database means I can offload most of the work from the client to the server - a core premise of GraphQL is supposed to be eliminating client data processing, as it get's the data in exactly the format it wants. I still use Amplify to manage my Auth and S3, and for that purpose it works very well.

cawfree commented 3 years ago

Trying.

markau commented 3 years ago

I have started to invest in the platform but an 18 month old issue like this, with no official comment, doesn't convince me that I would be able to manage a serious production application using amplify/appsync.

cdunn commented 3 years ago

Not by any means a scalable/robust migration system for a team but fwiw I have been using an AWS::CloudFormation::CustomResource with a a setupVersion and a setup lambda function.

        "Version": {
          "Ref": "setupVersion"
        },
        "ServiceToken": {
          "Ref": "function..."
        }

Then I've been making idempotent changes on version change via the lambda...works ok for dynamo/etc since you can't make substantial changes anyways but wouldn't be great for sql changes.

krikork commented 3 years ago

My approach has been the same as @cdunn. To elaborate a little, here are some more implementation details:

I have created a lambda called MigrationService. In the resources section of the template, I have the following custom resource:

"CustomMigrationService": {
      "DependsOn": [
        "AmplifyResourcesPolicy",
        ...
      ],
      "Type": "Custom::MigrationService",
      "Properties": {
        "ServiceToken": {
          "Fn::GetAtt": [
            "LambdaFunction",
            "Arn"
          ]
        },
        "TriggerVersion": 5
      }
    }

The most important thing in this custom resource is the TriggerVersion. If it is incremented, then the lambda will be executed upon deployment. So if you deployed with version 1, then made changes to your code and redeployed without incrementing the TriggerVersion, your lambda will not be executed.

Be sure to give the lambda the necessary access so it can make all the necessary migrations. I have done that by editing the AmplifyResourcesPolicy section and adding statements to the AmplifyResourcesPolicy > Properties > PolicyDocument > Statement. E.g.:

{
              "Effect": "Allow",
              "Action": [
                "cognito-idp:AddCustomAttributes",
                "cognito-idp:AdminAddUserToGroup",
                "cognito-idp:ListUsers"
              ],
              "Resource": [
                {
                  "Fn::Join": [
                    "",
                    [
                      "arn:aws:cognito-idp:",
                      {
                        "Ref": "AWS::Region"
                      },
                      ":",
                      {
                        "Ref": "AWS::AccountId"
                      },
                      ":userpool/",
                      {
                        "Ref": "authcognitoUserPoolId"
                      }
                    ]
                  ]
                }
              ]
            },

or

{
              "Effect": "Allow",
              "Action": [
                "dynamodb:Get*",
                "dynamodb:BatchGetItem",
                "dynamodb:List*",
                "dynamodb:Describe*",
                "dynamodb:Scan",
                "dynamodb:Query",
                "dynamodb:Update*",
                "dynamodb:RestoreTable*"
              ],
              "Resource": [
                {
                  "Ref": "storageddbBlogArn"
                },
                {
                  "Fn::Join": [
                    "/",
                    [
                      {
                        "Ref": "storageddbBlogArn"
                      },
                      "index/*"
                    ]
                  ]
                }
              ]
            }

Next up, the handler of the lambda needs to account for the creation of the custom resource. Here's the skeleton of my code:

exports.handler = async (event) => {
    const cfnCR = require('cfn-custom-resource');
    const physicalResourceId = "physicalResourceId-MigrationService-112233"
    const { sendSuccess, sendFailure } = cfnCR;

    if (event.RequestType === "Delete") {
        const result = await sendSuccess(physicalResourceId, {}, event);
        return result;
    }

    try {
       // your code here 

        const result = await sendSuccess(physicalResourceId, {}, event);
        return result;
    } catch (err) {
        // your code here 
        const result = sendFailure(err, event);
        return result;
    }
};

Probably the most important thing here is to handle the Delete event. Your lambda will be executed if your stack is being rolled back so if your stack is rolling back because the lambda errored out when deploying then calling it again during rollback will end up hanging cloudformation.

Lastly, I've implemented versioning so I do not rerun migration scripts. (Keeping scripts idempotent and re-runnable is always a great idea however, it could get expensive if you have a long list of migration scripts so skipping the ones that have already executed comes in handy. If you have few re-runnable scripts you can potentially skip this.)

In my case, i have 3 environments so I store the latest deployed version number in a dynamodb table. When the lambda is triggered it will pull the latest deployed version number on that environment and will then load+run the migration scripts that have higher version.

My migration scripts folder structure is: migrationScripts | component | version.js

(I have separated the project into a few components that could be deployed independently but you might not need that)

It would have been nice if there was a built-in feature to help with the migration but the good news is that this approach works (given adequate access) for any AWS resource change and not only data.

sasweb commented 3 years ago

@dabit3 any official statement on this? Is amplify a dev tool only? Please make it clear in the docs that amplify is not suitable for production apps. Many people spend a lot of time on this only to find out that most basic features are missing. Plus, no official statement for more than a year 👎

jacobsapps commented 3 years ago

Bumping this

simon-lanf commented 3 years ago

Yeah, this is critical.

MiladNazeri commented 3 years ago

Yeah, I've been searching everywhere for an understandable way to do this.

rraczae commented 3 years ago

we're also having an issue with this... any direction in the official docs would be appreciated

treystudio3a commented 3 years ago

I would really like to understand what the Amplify team's recommendation is on this... what are best practices, etc.

osddeitf commented 3 years ago

@dabit3 any official statement on this? Is amplify a dev tool only? Please make it clear in the docs that amplify is not suitable for production apps. Many people spend a lot of time on this only to find out that most basic features are missing. Plus, no official statement for more than a year 👎

Totally agree with you. It's easy to set up projects from scratch. But in long term, when there's a need to changes, we eventually are in hell. Amplify hide very much implementation details, so it's lack of production-grade features.

khalibloo commented 3 years ago

Glad I ran into this early in my evaluation. It'd have been catastrophic to hit a wall like this in production.

pagameba commented 3 years ago

This is also an issue for me. One key requirement is to have rollback support. Our dev team uses multiple independent environments and we often push other branches during code reviews, then push another branch, effectively removing previously added resources.

markymc commented 3 years ago

No response to this for so long really sucks. @josefaidt I see you've added this to your project board recently... perhaps a quick reply to at least give us some info would be nice?

renebrandel commented 3 years ago

Hey - wanted to drop a note in from the Amplify team. We're looking into some data / schema migration workflows right now, though because this space is really large, we won't address every single use case initially. Soon, we'll launch a mechanism to explicitly opt in to breaking changes during push. After that we'll look into more sophisticated migration workflows.

Question to the community, is this feature valuable already if we enforce to only allow data migrations if the schemas between the environments are exactly the same?

One of our core design challenges right now is to provide a smooth migration experience when it's not so obvious. For example, renamed models or fields, changed field types and nullability all within one "deployment step".

GeorgeBellTMH commented 3 years ago

I wonder if there could be some schema markup to help with this...where you make use of temporary @was, or @isNow

Old Schema

type Dog @model { id: ID! name: String! breed: String! favoriteToy: String! }

New Schema

type Animal @model @was('Dog'){ id: ID! name: String! type: String! isNow("Dog") breed: String! favoriteObject: String! @was("favoriteToy") }

@isNow basically fills in the field with a value...maybe could be hooked up to a lambda, or simple logic @was basically moves renames the object

Both of these would only work when the field didn't exist...so the migration only happens the first time it is encountered...and after all environments are migrated, you can safely remove them...

pagameba commented 3 years ago

Hey - wanted to drop a note in from the Amplify team. We're looking into some data / schema migration workflows right now, though because this space is really large, we won't address every single use case initially. Soon, we'll launch a mechanism to explicitly opt in to breaking changes during push. After that we'll look into more sophisticated migration workflows.

Question to the community, is this feature valuable already if we enforce to only allow data migrations if the schemas between the environments are exactly the same?

One of our core design challenges right now is to provide a smooth migration experience when it's not so obvious. For example, renamed models or fields, changed field types and nullability all within one "deployment step".

One of my use cases is that I need to make a change to the schema that involves a breaking change to the data that is already in the schema. For instance, a field that was previously not required becomes required and we need to backfill some data into existing records in order for AppSync to not complain.

What I am looking for is the capability to execute a series of migration scripts during or after the amplify deployment, where the scripts have an 'up' and a 'down' capability in case of rollback. The ideal solution would keep track of which scripts have been executed, execute the 'up' method during migration events, and have some way of rolling back a migration and triggering the 'down' event in the event that the deploy fails for some reason.

Ideally Amplify would provide the infrastructure and scaffolding for this and all I would need to do would be to run an amplify command to create a new migration script and then fill in the details of the up and down.

cfbo commented 3 years ago

@renebrandel This is also related to aws-amplify/amplify-category-api#180

On top the actual implementation that you guys might undergo (hope so), given that lots of people are implementing their own custom approach, I think it would be also very useful to provide guidance and feedback on what is the best route to implement a custom approach here.

Some tricky aspects upon schema update / migration:

how to integrate the approach in the build so that it is possible to roll back if there are errors. I get the general idea, but some guidance / best practice on this would be very helpful.
how to handle existing user sessions and datastore conflicts (might force a datastore clear).

I'll share some of our notes, it's just a draft:

In general, what about adding an entry in the amplify docs about data migration, mentioning the plans for implementation and alternative best practices for custom approaches?

Please keep us posted about your implementation schedule.

crazyzelot commented 3 years ago

@renebrandel is this being worked on in some form or fashion still? If so could you possibly link a branch?

v-raja commented 3 years ago

@crazyzelot https://github.com/aws-amplify/amplify-cli/pull/8425

Taylor-S commented 2 years ago

@renebrandel
Any update on this? Or can someone lead me to a best practices guide on this issue? I can't find anything in the docs and I often run into issues after Schema updates (Simply creating a non nullable field which doesn't currently exist within a db table). I'm hoping to start testing my application with live users and I'm certain migrations are necessary for that.

Am I going to have to write my own custom migration mechanism or has the team got something in the works?

renebrandel commented 2 years ago

Hi @Taylor-S For your particular use case, you should be able to use the @default directive in your new field. https://docs.amplify.aws/cli/graphql/data-modeling/#assign-default-values-for-fields

But migration use case are obviously much larger than just that. We're currently working on a @mapsTo directive that allows you to rename an existing field/model to a new name.

Taylor-S commented 2 years ago

@renebrandel , Awesome! That directive will definitely help me out. Obviously I'm very new to graphql and amplify. :) Glad to hear the team has something in the works. I'll keep an eye out for the update. Thanks for the quick reply

edwardfoyle commented 2 years ago

Hi everyone, while we haven't yet addressed all of the concerns mentioned in this thread, we are excited to announce a new @mapsTo directive to help with certain scenarios. It is available in the latest version of the CLI (7.6.14) as a developer preview. To try it out, you don't need to do anything except start using the directive in your schema.

This directive can be used to rename a GraphQL type but retain the original table and data. Usage looks like:

type Article @model @mapsTo(name: "Blog") {
  id: ID!
  title: String!
}

Where "Blog" is the original name of the "Article" type that contains data you want to retain. For more details, check out the docs PR here: https://github.com/aws-amplify/docs/pull/3890/files

alphonse92 commented 2 years ago

Hi everyone, while we haven't yet addressed all of the concerns mentioned in this thread, we are excited to announce a new @mapsTo directive to help with certain scenarios. It is available in the latest version of the CLI (7.6.14) as a developer preview. To try it out, you don't need to do anything except start using the directive in your schema.

This directive can be used to rename a GraphQL type but retain the original table and data. Usage looks like:
type Article @model @mapsTo(name: "Blog") {
  id: ID!
  title: String!
}
Where "Blog" is the original name of the "Article" type that contains data you want to retain. For more details, check out the docs PR here: https://github.com/aws-amplify/docs/pull/3890/files

That's not a solution when you need to populate your tables and then start working on the app.

trsrm commented 2 years ago

Any updates on it?

jmarshall9120 commented 1 year ago

@renebrandel, @alphonse92 - have you considered some sort of bulk import/export tool for data migrations?

I understand the difficulty in developing a mature, universal migration framework, however.....

If we could extract large swaths of data from dynamo, say into an RDBS, we could:

Extract a swath of data from Dynamo.
Process the migration.
amplify push to execute a destructive schema update.
Re-insert the data back in a large swath.

This is definitely the old school way to process a migration, but it can be used in 100% of cases. Would surely plug the gap in Amplify's migration capabilities in a hurry.

ww-daniel-mora commented 8 months ago

I found this while looking for a solution to this problem and it works well.

https://github.com/technogise/dynamo-data-migrations

It is a cli for creating and managing migrations. I has up and down commands to execute migrations and rollback and it it keeps track of what has been applied and the migration order.

There are 2 caveats for this tool:

It has no concept of amplify environments. There is on log (table) of events for the whole AWS region. You can work around this by making per environment migration scripts or just update all databases at once. I can also imagine making an amplify specific fork that introduces this concept and looks at the user:Stack tags
It uses the V2 js library. You can either work with the V2 library or just ignore it and use the V3 code inside of each up and down migration function.

wxxedu commented 2 months ago

5 Years still no updates? Bumping this. Please have a look at this!

aws-amplify / amplify-cli

A way to migrate data when updating app #1407

Old Schema

New Schema