Best practices for seeding DynamoDB entries?

ahansson89 commented 5 years ago

Which Category is your question related to? Amplify

What AWS Services are you utilizing? DynamoDB AppSync

Provide additional details e.g. code snippets I am currently thinking about how I should seed my DynamoDB tables with entries and how to best structure this code.

I essentially want to seed a couple of hundred entries and dont want other devs to manually add entries one by one if we can structure this with code. How are others doing it?

What I am thinking about right now is just to write a seed.js node function and load up the tables directly in DynamoDB with an iteration through my entries. Wonder if there is a better more "Amplify" native way to do this.

Thanks

RossWilliams commented 5 years ago

I use an S3 bucket with a trigger to process csv files. There is a specific folder structure to keep all updates in order. This gives me a poor way to audit updates, its just a new file that appends / removes more information. The same lambda is hooked up as a custom resource so that when a dev creates a new environment it will take our base set of csv files. I also pump all information through AppSync and not into dynamo directly. The lambda uses the generated types to prevent accidental breaking changes. This is only done with data that changes once a year (political boundary information).

ahansson89 commented 5 years ago

Interesting @RossWilliams . I assume that you in that case have one main dev environment and not an environment per developer?

Yes, thinking about it, I probably want to use AppSync directly. How do you handle authentication to AppSync in that Lambda function? Username and password or roles? I do not want to give IAM and Lambda full write access to everything in AppSync, but I guess I could pass a Cognito username and password to when triggering the seed.js file from command line, that would just not work on a Lambda function without me hardcoding credentials or storing them in environment variables...

RossWilliams commented 5 years ago

We have an environment per developer and per backend feature. We store the same csvs in source code so that new environments get bootstrapped with proper boundary data via a custom resource. We also use this to bootstrap test data.

We use SSM to store service account credentials, no hard coded passwords needed. Another custom resource sets up the service account and store the password in SSM. My models include auth rules for the service account. This was setup before the new auth rules existed

ChristopheBougere commented 5 years ago

I dug a little bit on this topic a few months ago and didn't find anything exciting. So I'm seeding my DynamoDB tables using a custom CLI and JSON files. If you are interested I detailed that in a post: https://medium.com/@christophe.bougere/aws-amplify-beyond-the-quickstart-c389f8e44c92#45d8 However, I would love having this capability natively in amplify CLI with a data model closer to the GraphQL schema than DynamoDB tables (in my solution 1 JSON file = 1 DynamoDB table), because it can be cumbersome to create all relationship in many files on large datasets.

yuth commented 5 years ago

I am marking this as a feature request for Amplify CLI to support seeding data natively as we don't support this right now.

ianshea commented 5 years ago

I did this with a function I wrote in Node. I run it locally. It's a BatchWrite to Dynamodb.

var AWS = require("aws-sdk");

AWS.config.update({
  region: "us-west-2"
});

// pulls creds from my local AWS Creds
const credentials = new AWS.SharedIniFileCredentials({ profile: "MY_COOL_PROFILE" });
AWS.config.credentials = credentials;

const docClient = new AWS.DynamoDB.DocumentClient();
let itemsToPut = [];

// In my case I loop through a provided JSON doc, and transform/normalize some data
// This is real stripped down for clarity.
for(let i = 0; i < SOURCE.length ; i++) {
    itemsToPut.push({
      PutRequest: {
        Item: { title: SOURCE[i].title }
      }
    });

    // Delete items instead....
    // itemsToPut.push({
    //   DeleteRequest: {
    //     Key: { id: newProduct.id }
    //   }
    // });

}

// Dynamo has a limit on batch requests of 25 items...
// Chunk it out into an array with 25 items a piece
const almostReady = chunkArray(itemsToPut, 25);

almostReady.forEach((items, index) => {
  const batchWriteParams = {
    RequestItems: {
      "TABLE_TO_PUT_ITEMS_IN": items
    }
  };

  docClient.batchWrite(batchWriteParams, function(err, data) {
    if (err) console.log(err);
    else console.log(JSON.stringify(data), new Date().toISOString());
  });
});

// Break the array into chunks
function chunkArray(myArray, chunk_size) {
  var results = [];

  while (myArray.length) {
    results.push(myArray.splice(0, chunk_size));
  }

  return results;
}

Takes me under a minute to process my data (JSON file is ~130mb, so we use a stream) and import ~25,000 items. These items do not have connections. FYI - If you have your @model setup with the @searchable directive it'll also stream into Elasticsearch. But that stream takes awhile (~15-30min) to fully catch up.

MiqueiasGFernandes commented 3 years ago

I'm currently using a Node.js script that calls API class of aws-amplify SDK. This call include my JSON file and auto seed DynamoDB through AppSync API. Code example:

// AWS Amplify required imports
const {
  Amplify,
  Auth,
  API,
  graphqlOperation
} = require('aws-amplify');
// Mock to authenticate to API with Cognito (If you're using Cognito)
const authconfig = require('./auth_config.json')
// AWS Configuration - I recommend that's you use aws-exports file
const awsconfig = require('../../src/aws-exports');
// My graphQL Mutations
const {
  createTag
} = require('../../src/graphql/mutations')

//Aditional lib to generate UUID
const uuid = require('uuid')

//My JSON with local data that will be seed in DynamoDB
const Tags = require('../../src/data/Tag.json')

// Repeat this function for all tables, or implement another logic to become this very dynamic (In my case, i don't need)
function getTags() {
  console.log('Inserindo registros na tabela Tags...')
  Tags.forEach(async (item) => {
    const id = uuid.v4();
    const inputData = {
      id: id,
      name: item.name
    }
    console.log(`- Inserindo tag: ${item.name}`);
    try {
      // This is a core of function, here the Amplify SDK do the "magic"
      await API.graphql(graphqlOperation(createTag, {
        input: inputData,
      }));
    } catch (error) {
      console.error(`${inputData}: \n${error}`);
    }
  })
}

// Initialize tables seed
async function initialize() {
  try {
    //Configure AWS and authenticate in Cognito (If is required)
    Amplify.configure(awsconfig)
    await Auth.signIn(authconfig.username, authconfig.password);
    console.log('Semeando banco de dados...');
  } catch (error) {
    console.error('Authentication failed!', error);
  }
  getTags();
  console.info('Dados inseridos com sucesso!');
}

initialize();

If you'll interact with modules that's use import and export, you must install babel-node as dev dependencies, and edit your .babel.config.jsto:

module.exports = {
  presets: ['@babel/preset-env']
};

So, you can execute script with command: babel-node --presets env scripts/seeder/seed.js, and configure script inside package.json, example: "db:seed": "babel-node --presets env scripts/seeder/seed.js". (npm run db:seed)

duranmla commented 2 years ago

@MiqueiasGFernandes any chance that you run over this error?

Error: Amplify has not been configured correctly.

It seems like when I run the script it is like the Amplify.configure hasn't been run (and it is being run). Also, the error suggests other things like multiple conflicting versions of amplify packages but nothing of that give me a sense of direction of what's going on.

The script I wrote can be found here: https://gist.github.com/duranmla/d41ad59f0e54bf1681fd922b485ba677 but it is essentially the same you have I would say. Besides, I am running the script using ts-node but I don't think that will cause any harm to lead to this error.

npx ts-node -O '{"module": "commonjs"}' ./scripts/generate-data

LaurensBrinker commented 2 years ago

To help solve this issue for our own project, @mehow-juras and I created an Amplify Plugin (beta) to seed our local and remote dbs using our Amplify GraphQL api. We're looking for improvements / feedback so feel free to check it out:

NPM Package: https://www.npmjs.com/package/amplify-graphql-seed-plugin Source: https://github.com/awslabs/amplify-graphql-seed-plugin

It allows you to define your seed-data through code (so it can be checked into VCS as well), and you can e.g. create multiple seeding data entries by using a library like Faker with a single line of code.

amplify-seed-plugin-demo

BBopanna commented 1 year ago

Any updates here ? Seems like a feature that needs to be rolled from amplify and this request is open from 2019 !!!

cjsilva-umich commented 1 year ago

Would love for this to be added in so that we can run "amplify mock api" and generate seed data to use off the bat.

aws-amplify / amplify-cli

Best practices for seeding DynamoDB entries? #2563