aerogear / graphback

Graphback - Out of the box GraphQL server and client
https://graphback.dev
Apache License 2.0
409 stars 73 forks source link

Deprecate graphback db #1114

Closed wtrocki closed 4 years ago

wtrocki commented 4 years ago

Idea here is to deprecate/remove graphback db command while still keeping in code migrations for development purposes

craicoverflow commented 4 years ago

A concern I have with adding the migration execution into the template code is this effectively means the user has to configure and account for a separate production and development scenario in their code.

Yes, they could wrap the migration execution in if (process.env.NODE_ENV !== 'production') so that migrations do not happen when the application is deployed somewhere.

Another downside to having it execute in the template is we are creating work for the user to inspect the output ops or operations which show what changes have been made to the database. if migrations happened from the graphback package we could display the operations in a much more digestible format.

So what about moving the migrations into graphback?

const graphbackSchema = graphback({
  schema: mySchema,
  provider: myProvider,
  migrations: {
    enabled: process.env.NODE_ENV !== 'production',
    options: {
       ...
    }
  }
})

One single configuration object is more maintainable and probably less confusing.

wtrocki commented 4 years ago

Would you see admin executing graphback db on production databases? I think we need to see how we can map scalars first. I will prefer migrations to be executed in the plain sight of code that I control - eventually they could become separate to graphback with some specific filter to include graphback specifics.

craicoverflow commented 4 years ago

No I wouldn't but that is the point - having it in the template code means all it takes is a mistaken invert of checking process.env.NODE_ENV and running it in production would execute a migration.

Otherwise developers would need to comment in or out the migration block in their own code between production and development.

wtrocki commented 4 years ago

By the end of the day it is template that people control and make decision based upon. Our community templates are different than datasync template where this needs to be robust etc. I still see bringing it towards API as less choice or more chance to end up with the issues

craicoverflow commented 4 years ago

So the primary (only?) reason to remove graphback db is for custom scalar mapping, but isn't this also easily achievable by mapping these in the config?

Scalar mapping in code:

migrate(config, schema, {
  scalarMap: (field, scalarType, annotations) => {
    if (scalarType && scalarType.name === 'Timestamp') {
      return {
        type: 'timestamp',
        // useTz, precision
        args: [true, undefined],
      }
    }

    return null
  }
})

Simple scalar mapping can be achieved in a static config:

dbmigrations:
  ...
  scalar-map:
    name: 'Timestamp'
      type: 'timestamp'
      args:
        - true
        - undefined

This is a straightforward direct mapping between a GraphQLScalar and database column type.

However it could be trickier to map scalars on non-simple conditions such as remapping based on annotations and the field name like below.

migrate(config, schema, {
  scalarMap: (field, scalarType, annotations) => {
    if (field.name === 'id' || annotations.type === 'uuid') {
      return {
        type: 'uuid',
        args: [],
      }
    }

    return null
  }
})
wtrocki commented 4 years ago

I would guess that graphback db brings no value as it requires manual intervention for situation where we literally need database to be aligned in startup. Imagine so many issues for people that forgot to do database migrations and figure out that later during development or queries.

For development, I would expect my db to be migrated every time app starts as you need to restart application to get the results.

Intial idea of db command comes from systems that had container based approach that do not have any physical resolvers mounted (it can live reload basically) so it made sense there.

As for config - yes - this is what graphiql does now - it allows to define custom scalars in graphql-config, however, there is more to that like strategies and working with the code is much easier to not make mistakes than config. Especially that some mistakes here can cost your (hopefully) development data.

Finally graphback db gaved people too much freedom over the what you migrate. You can configure some remote server and migrate it from local machine (WE DO NOT WANT PEOPLE TO EVEN TRY IT) this should be always connected with the context of the application development and gave you more or less chance to screw up env you do not own

There are no production features in current migrations (you can rename field and since we do not have previous schema we cannot validate and tell you that you probably going to drop data unless using @db.rename directive. With Graphback db we have this story about graphback updates your database - not really telling where this should be used.

Having the same in the code is not going to change that but at least the result will be local to the running server.

Finally usefulness - graphback db cannot be used for openshift like environment as db is usually not exposed. What people need is app-level migration anyway (if they decide to use that for Test/QE server etc.)

Migration engine can be useful for us for scanning database and getting data-driven approach that will fit nicely as some helper to other projects like Teiid

craicoverflow commented 4 years ago

You can configure some remote server and migrate it from local machine (WE DO NOT WANT PEOPLE TO EVEN TRY IT)

The exact same thing can happen when migrations happen from the codebase.

I am just playing devil's advocate here so we can explore options, but personally I am in favour of dropping graphback db.

My main question is whether graphql-migrations should be a direct dependency of the template or should migrations be executed from the graphback package (see https://github.com/aerogear/graphback/issues/1114#issuecomment-625204720) so that we can produce easy to digest logs and provide proper error handling etc.

I will prefer migrations to be executed in the plain sight of code that I control

At the end of the day the migrations would still be executed in the sight of the code you control, as it would be opt-in by extending the graphback options configuration with a migrations section. It would just feel much more like a single unified experience.

How it looks now:

const runtime = graphback({
  provider: myProvider
})

await migrateDB(db, runtime.schema, {...});

How it could be:

const runtime = makeExecutableGraphbackSchema({
  schema: buildSchemaFromFiles(...),
  provider: myProvider,
  migrateDB: {
    config,
    options: {
       ...
    }
  }
})

We don't have to decide this now, it is a big decision. Interested to hear @darahayes and @ssd71 opinions?

wtrocki commented 4 years ago

I do think that having migration running from a template is fine. This boils to the fact of how we going to review this with runtime implementation improvements - if there will be a need for them.

EDIT: Db layer seems to be separated from runtime in typical application layers.

ssd71 commented 4 years ago

My main question is whether graphql-migrations should be a direct dependency of the template or should migrations be executed from the graphback package

Certainly, having template directly depend on graphql-migrations does seem a bit awkward as it wouldn't be responsible for anything in a production environment if it is deployed there, but there is also case for accidentally deploying with migration opt-ins still present in the code, so I guess we would need some safeguards against that, such as to require some degree of manual intervention if somebody launches app with migrations opted in. But that might work against devs when doing dev work.

darahayes commented 4 years ago

In my experience, database migrations are always an entirely separated task from actual application process. In my mind something like graphback db or an entirely separate script that calls migrateDb() is the best way to do things in a real life scenario.

This is the only guaranteed way that your database migrations will work across every environment in a way that's reliable. Also, applications that follow best security practices often use a non admin database user that does even not have permissions to modify tables (create, drop, adding columns etc), so migrating at startup would fail.

You'd normally run the migration as a pre upgrade step (environments like k8s tend to have features that allow this) and then the deployment continues if it the migration is successful. Ideally you do this on a pre-prod/staging environment first to know that the migration is actually going to be successful.

For development you just have a develop script in your package.json npm run migrate && npm run start.

I see a lot of arguments to the tune of 'what if someone accidentally runs migrations on the wrong environment...' Ultimately, no matter which approach you take, somebody or some process will have to make a decision to run a migration at some point so it is literally impossible to reduce that risk to 0.

Tl;dr strong -1 on having the runtime/application execute the migrations. They should be a separate task that a person or some automation chooses to run against a particular environment.

darahayes commented 4 years ago

Also, If Graphback is truly data source independent then bundling the database migration stuff into the core/runtime doesn't really make sense.

craicoverflow commented 4 years ago

@darahayes I agree with everything you said here. The problem with this is that the schema does not exist until the runtime application starts since it is created in-memory and can add extra fields (such as version, timestamps etc).

This is a problem with how Graphback exposes the final schema.

The only way I see this working is:

  1. Generate the final schema which is output to a file.
  2. Run migrations from generated schema
  3. Run runtime application which only generates resolvers in-memory.
darahayes commented 4 years ago

@craicoverflow couldn't the migration script import the model and also generate the schema in memory and execute the migrations based on that?

wtrocki commented 4 years ago

We talking about production migrations here which are not our target. We do not log database changes and effectively do not have rollback phase.

Our db migrations are to... create database structure and manage it during development time. Rolling to production requires different tools and even then people using something like dbmigrations or flyway are executing changes in runtime. Bringing some cases where some folks do not have permissions - that is even reassuring us why this should be separate and definitely in code.

Additionally new CRUD is going to bring requirements that will be hard to do in cmd - for example migration mechanism needs to be aware of the container components etc. datasync etc. This will mean that for datasync or other modifications we will need different flavour or CMD or something even more crazier.

wtrocki commented 4 years ago

As follow up. If we want we could build a script that will execute our migrations in our production but that will be an incredibly bad thing to do. The best way is to simply track the db changes between the releases and apply them as SQL or Mongo Map-reduce outside Graphback. We are not out of the box firebase like thing that can be deployed and driven by only schema changes

craicoverflow commented 4 years ago

couldn't the migration script import the model and also generate the schema in memory and execute the migrations based on that?

the model and generated schema custom types end up being different because plugins dynamically add fields based on metadata. So User.version might exist in the final schema but not the model and using the model for migrations means that a version column would not be created during migration.

wtrocki commented 4 years ago

Schema driven database is very very rare use case that we give people to experience graphback and build their app - it is community use case but it has nothing to do how Graphback will be used in production. Usually, you have some data there and you want to expose it thru GraphQL - we do not have:

So people will need to manually build a schema that will match their DB and hoping for knex or Mongo to be able to fetch that data. That is the real production use case that we should be talking about.

Db migrations are a nice to feature that is easy to demo and cool for building POC's when there is no data available (which is rare)

EDIT: kinda went offtopic here to provide context.

wtrocki commented 4 years ago

version column would not be created during migration.

Additionally this is kinda risky business. Your intern forgot to add offix plugin to the new version and suddenly your database can lose all version information. This type of migration can be done only on development time on some local machine. We have filtering for non destructive changes and probably going to need another one but there is still chance for something to not go right due to human error