Choose a ORM for node-based services

bk3c commented 4 years ago

Assuming we continue down the path of node-based reference implementations for the core services, the next major decision we'll need to make is an ORM for database operations. Postgres is the presumptive database, perhaps with an HA variant in the future, but not on the immediate horizon.

In addition to good postgres support, we also want enough Typescript support that we won't need to be casting every result set or otherwise writing too much boilerplate.

Options to consider include:

I think we're open to other options, as well, so long as they meet our core needs and have a fighting chance at long-term sustainability.

There's also still the option of choosing something like NestJS as a more full featured backend framework, which would then likely imply (though not conclusively) using their TypeORM integration. It's still not clear we need all of those features, though.

bk3c commented 4 years ago

Relevant NPMTrends view.

software-project commented 4 years ago

I had a brief look on all of above solutions. Sequelize seems the most mature solution, but also the most 'old school'. Knex is more or less the same. Prisma2 is in beta and from what I found not super stable yet. So my contender is TypeORM and I'm going to prototype it tomorrow. Apollo popped up couple times as well. Did we shut GraphQL doors or is it worth testing it as well?

bk3c commented 4 years ago

@software-project I definitely think the door is still open for GraphQL, but I think I'd like to keep that as a separate decision from the ORM unless there's a compelling reason we need to make the the decision together.

If you want to prototype the two together since that seems like better leverage for your time, I'm totally fine with that -- let's just try to keep the decisions and timelines separate on the other end (and comment on the GraphQL learnings in #77)

software-project commented 4 years ago

I think I will keep it separate for now, cheers.

jaredcwhite commented 4 years ago

Venturing forth a very uninformed opinion that Prisma 2 might deserve a closer look if for no other reason than it's the ORM of choice for Redwood.js, and the lead dev there (Tom Preston-Werner) is no slouch. So it might be improving at a pretty rapid clip going forward…

bencpeters commented 4 years ago

I think it's worth having a discussion about what level of abstraction is "ideal" for an ORM here.

A lot of ORMs start from definitions of types in code, then either automatically generate migrations, or rely on user-generated migrations to get the database to match those definitions. They then use the code-based object definitions to generate queries. In my experience, this usually leads to needing to learn a whole new DSL (the ORM), but because it's an imperfect mapping, usually also requires working in SQL for complex queries, performance optimizations, and development (e.g. testing out queries in the DB console to figure out the data you need, then translating it into an ORM). As apps grow, I've found the ORM can end up getting in the way more and more.

Another approach is to basically treat the database as the root source of truth - after all, that's where the actual data needs to be stored, it's going to be strongly enforcing types at the end of the day, and everything eventually runs through SQL, a language designed for CRUD operations & relational queries. In this view of the world, the ORM (or maybe query helper is a better term?) takes a little bit of a back-seat role - it basically takes care of compiling/wrapping generated sql statements to protect against injection, and ideally creating TypeScript mapping of returned database objects. However most of the actual queries get written in SQL and migrations to the schema are the record of database changes.

Libraries in this vein are pg-sql2, which basically just gives you templating tools to write safer SQL queries in JS:

import * as sql from 'pg-sql2'

const completeQuery = sql.query`select * from users`

// Dynamic table name/column name/other identifier
const table = sql.identifier(userTableName)
sql.query`select * from ${table}`

// joining fragments together
const values = sql.join(valuesToInsert.map(val => sql.value(val)), ',')
sql.query`insert into table(col_name) values (${values})`

// Trying to inject anything that's not "safe" raises an error (e.g. `userId` is a string)
sql.query`select * from users where value = ${userId}`
// correct
sql.query`select * from users where value = ${sql.value(userId)}`

const whereClause = sql.fragment`where id = ${sql.value(userId)}`
sql.query`select * from users ${whereClause}`

// Once you've built a query:
const { text, values } = sql.compile(query)
pgClient.query(text, values)

If we could have a wrapper around this type of lightweight query helper library that generated Typescript type definitions from DB introspection and wrapped the raw query results, that might give us a lot of the value of an ORM with very little added complexity and a focus on the DB/SQL.

However it obviously involves writing raw SQL, so if that's something the team would rather avoid, that's fine too, I just wanted to bring it up as a discussion item.

If we did go with something along these lines ^, good "version control" for a database ends up being fairly important. These are some of the projects I've either used in the past, or seem pretty promising:

software-project commented 4 years ago

I don't have much backend experience in JS, but writing RAW SQL in code feels like early version of LINQ to SQL that I used in 2007 (and didn't like ;)) I had a brief look on Redwood.js and so far I love what I see, so I'm all for giving it a try as well. Proper JAM stack, monorepo included, react, ORM nad others.

bencpeters commented 4 years ago

I just pushed a demo branch of implementations for Listings data using Knex (optionally with the Objection.js mapper).

This includes a sample migration to make tables for listings and units as well as custom SQL trigger functions to automatically generate a unique hash ID using Postgres to illustrate how to call raw DB functions (this could also be done in JS, I just wanted to show an example of calling raw SQL).

Objection basically just wraps the Knex query builder library and adds models, so this type of approach would let us construct queries in a fairly straightforward "sql-esque" way using Knex builders to generate SQL, or utilize Model definitions that provide a model level of abstraction. This file shows both ways of forming queries.

As far as TypeScript integration goes, Knex has some weak TypeScript support built in, but for anything other than trivial selects it's probably not super useful out of the box. More realistically, I think the path I took in this demo function is probably the way we'd handle it - keep a TypeScript Interface definition of a given db model, and then cast the Knex results after loading. Note that this method isn't totally safe - there isn't a compile-time check on how well the knex results actually line up with the type it's being cast to, so we'd likely want to implement a common utility function and/or unit test to ensure that mappings between DB columns and the defined TS type stayed valid.

For Objection, the Model definitions are TypeScript classes, so the definitions are usable as Types out of the box, although we couldn't use them as shared Interfaces around the app from a common repository the way we do now, so we'd likely want to keep the independent Interface definition, making the extensive model definition of somewhat limited use. We also don't have any compile time matching to the DB state in this scenario either, so again we'd probably want some kind of automated testing. Realistically, if we end up keeping a common Interface definition for use in different services and apps for many of these models anyway, this might lead to more duplication in definitions between the Models and the Interface definitions.

Anyway, take a look at the sample implementations! Excited to see TypeORM examples!

I also found a number of references to Mikro-orm when reading for this - perhaps worth a look as well?

pbn4 commented 4 years ago

Venturing forth a very uninformed opinion that Prisma 2 might deserve a closer look if for no other reason than it's the ORM of choice for Redwood.js, and the lead dev there (Tom Preston-Werner) is no slouch. So it might be improving at a pretty rapid clip going forward…

The problem with Prisma that I see is:

Transactions are a commonly used feature in relational as well as non-relational databases and Prisma Client might support more transaction mechanisms in the future. Specifically, the following two use cases might be supported: Sending multiple operations in bulk Enabling longer-running transactions where operations can depend on each other

you can find it in the documentation.

Based on the discussions and what I found in the repository I believe we want to achieve is a separation of domain models from persistence layer.

For Objection, the Model definitions are TypeScript classes, so the definitions are usable as Types out of the box, although we couldn't use them as shared Interfaces around the app from a common repository the way we do now, so we'd likely want to keep the independent Interface definition, making the extensive model definition of somewhat limited use.

If I understood this correctly @bencpeters you would like to have an easy way to use objects that implements domain model interface to be easily used with persistence layer. Please take a look at Data Mapper pattern and how it would look like with TypeORM:

Entities (one file for simplicity):

import {Entity, PrimaryGeneratedColumn, Column} from "typeorm";

@Entity()
export class UserModel implements User {

    @PrimaryGeneratedColumn()
    id: number;

    @Column()
    firstName: string;

    @Column()
    lastName: string;

    @Column()
    age: number;

}

export interface User {
    id: number;
    firstName: string;
    lastName: string;
    age: number;
}

And here is how Data Mapper pattern makes use of repositories:

import "reflect-metadata";
import {createConnection} from "typeorm";
import {User, UserModel} from "./entity/User";

createConnection().then(async connection => {
    try {
        const user = {
            firstName: "first",
            lastName: "last",
            age: 20,
        } as User;
        let userModelRepository = connection.getRepository(UserModel);
        await userModelRepository.save(user)
    } catch(e) {
       console.log(e)
    }

}).catch(error => console.log(error));

Please notice how you can use domain model (User interface) with repository. Also repositories make use of querybuilders as described in [https://github.com/typeorm/typeorm/blob/master/docs/select-query-builder.md](typeorm docs), so this closely your approach with Knex @bencpeters. Some interesting dicussions about domain separation and typeorm can be found here. Transactions are also possible this way, but I'm not pasting the code here since this post is already pretty long.

Also TypeORM offers you raw queries if you need them (I assume optimizing queries is actually a very rare task to perform) and has built-in incremental schema updates. The only problem with TypeORM that I hit was long start up time when there are many migration files: https://github.com/typeorm/typeorm/issues/4136 - roughly 0.5sec increase in startup time per migration file. I am unsure if we did something wrong with it in this past project or this is a generic problem.

Mikro-orm - I did not have the opportunity to use it but I just checked the docs and it looks promising. Entity references and the fact that accessing an unloaded relation property throws an errors instead e.g. returning 'undefined' looks like a useful feature. A nice introduction can be found here. My only concern is that there is a single author behind this ORM and if he decides to drop it at some point we are left patching this ORM ourselves.

jaredcwhite commented 4 years ago

@pbn4 My thoughts regarding Prisma transactions — sure, it would be nice to have more control over transactions, but there's really not a whole lot of data we're dealing with at the moment nor is it updated frequently. So, for example, updating several Units and a Listing at the same time probably doesn't need to be in the same transaction as long as we order things in a reasonable way (i.e., update Units first, then the Listing…so if there's an error re: the Units it would stop the Listing update from going through).

I also like the ActiveRecord patterns that Prisma provides, which should allow for a pretty easy transition from the current JSON data service. I haven't had a chance to look too closely at it yet or as compared with TypeORM, but the examples in the docs look pretty neat.

jaredcwhite commented 4 years ago

@bencpeters Sorry, I sort of lost track of the thread here, didn't see your Knex example before. That looks pretty nice as a way of doing almost-but-not-quite-bare-metal queries. My impression is that this would be useful in cases where more obscure reporting/search/dashboard-type queries are needed. I'm not sure it's the way to go with the more standard CRUD-type operations we'll mainly be doing at the start?

pbn4 commented 4 years ago

@jaredcwhite OK I see that Prisma could actually allow to avoid duplicating models definitions if we were to export it's generated types, correct?

Also I just figured out that separating entity definition from interface is doable with TypeORM:

export interface User {
    id: number;
    firstName: string | null;
    lastName: string | null;
    age: number | null;
}

export const UserEntity = new EntitySchema<User>({
    name: "users",
    columns: {
        id: {
            type: Number,
            primary: true,
            generated: true
        },
        firstName: {
            type: String,
            nullable: true,
            name: 'first_name',
        },
        age: {
            type: String,
            nullable: true,
            name: 'age',
        },
        lastName: {
            type: String,
            nullable: true,
            name: 'lastName',
        },
    }
});

EntitySchema will throw type errors if the specified type is not matching schema's definition. This way interface is always aligned with persistence layer and yet they are separated (interface is clear of decorators etc.). What do you think?

jaredcwhite commented 4 years ago

@pbn4 I think so…the idea is you just work within the schema definition format used by Prisma, and the TS interface and other client implementation details are generated from that.

I really would like to ensure we don't have to maintain field definitions in two places…seems like that's just asking for trouble from a maintenance perspective.

software-project commented 4 years ago

Ok, I've tested Prisma a bit, here is the code: https://github.com/bloom-housing/bloom/pull/254 My general impression working with it (docs, how easy was it to understand and number of times I got angry when sth was not working) is much better than working with TypeORM. Issues that I have with Prisma:

cant map jsonb field to class. It would like to see addresses on listings stored as jsons inside listing model. I didn't find a way to do it with Prisma, so created a separate model for it. But it would require extra db calls, so not to good.
prisma migrate is experimental, so in the model we can't define precision of floats

Creating entities was much better, I could create whole listing with dependencies in one transaction. So I would pick TypeORM, although it will require a little duplication in model definition and interface, it is flexible enough to design db schema as it should be.

bk3c commented 4 years ago

We had a conversation about this during today's standup, and we have consensus around TypeORM as the solution with the fewest drawbacks and widest support.

bloom-housing / bloom

Choose a ORM for node-based services #227