deepkit / deepkit-framework

A new full-featured and high-performance TypeScript framework
https://deepkit.io/
MIT License
3.18k stars 121 forks source link

[orm] support for more complex sql queries #152

Closed timvandam closed 10 months ago

timvandam commented 2 years ago

Some things are hard to do in the orm

1. Breaking up large queries In raw SQL, I would use WITH to break up large queries. I don't necessarily think that a way to be able to use WITH is crucial. Instead, being able to re-use queries would allow for identical functionality (as far as I know, except for WITH RECURSIVE)

A practical example where we want to find employees that get paid above average, but perform below average:

enum Performance {
  UNKNOWN = 0,
  POOR = 1,
  LOW = 2,
  BELOW_AVERAGE = 3,
  AVERAGE = 4,
  ABOVE_AVERAGE = 5,
  HIGH = 6,
  EXCELLENT = 7,
}

@entity.name('employee')
class Employee {
  @t id: number = 0;
  @t.backReference() contract?: Contract;
  @t.enum(Performance) performance: Performance = Performance.UNKNOWN;

  constructor(@t public name: string) {}
}

enum Job {
  JUNIOR_SOFTWARE_ENGINEER,
  SENIOR_SOFTWARE_ENGINEER,
  HR_MANAGER,
  HIRING_MANAGER,
}

@entity.name('contract')
class Contract {
  @t id: number = 0;

  constructor(
    @t.reference() public employee: Employee,
    @t public salary: number,
    @t.enum(Job) public job: Job,
  ) {}
}

In SQL you could do this:

WITH average_job_salary AS (
    SELECT job, AVG(salary) AS salary
    FROM contract
    GROUP BY job
),
average_job_performance AS (
    SELECT c.job, AVG(e.performance) AS performance
    FROM employee e INNER JOIN contract c ON (c.employee_id = e.id)
    WHERE e.performance > 0
    GROUP BY c.job
),
SELECT e.name
FROM employee e
  INNER JOIN contract c ON (c.employee_id = e.id)
  INNER JOIN average_job_performance avg_jp ON (c.job = avg_jp.job)
  INNER JOIN average_job_salary avg_js ON (c.job = avg_js.job)
WHERE e.performance < avg_jp.performance
AND c.salary > avg_js.salary

Proposed solution

const averageJobSalary = db.query(Contract).select('job').withAverage('salary', 'salary');
const averageJobPerformance = db
  .query(Contract)
  .select('job')
  .useInnerJoin('employee')
  .withAverage('performance', 'performance') // this aggregation does not work yet, see #2
  .filter({ performance: { $gt: 0 } })
  .end()
  .groupBy('job');

const result = db
  .query(Employee)
  .select('name')
  .innerJoin('contract')
  /**
   * This is an idea of how joins with query objects could work.
   * The point is that you provide two arguments: a query and a filter factory which is used to create the ON clause
   * The argument the filter factory gets is an object which can be used to reference fields of whatever object the query returns
   * 
   * An alternative for a filter factory could be strings in order to use USING instead of ON.
   */
  .innerJoin(averageJobPerformance, (averageJobPerformance) => ({
    performance: { $lt: averageJobPerformance.performance },
  }))
  .innerJoin(averageJobSalary, (averageJobSalary) => ({
    contract: { salary: { $gt: averageJobSalary.salary } },
  }))
  .find();

2. Adding aggregate columns to SELECT inside of a join (useJoin/useInnerJoin) See the 2nd query (averageJobPerformance) of the previous example. The withAverage('performance', 'performance') line does not actually add a AVG(performance) AS performance to the query yet

3. Locking reads E.g. SELECT FOR UPDATE and SELECT FOR SHARE in postgres, mysql

4. Create queries on subqueries Take the WITH query from before, say we very often use the table resulting from the average_job_salary query. In the database we could create a view to make it reuseable.

In code, almost the same functionality can be achieved by allowing queries to created on subqueries

const AverageJobSalary = db.query(Contract).select('job').withAverage('salary', 'salary');
const result = db.query(AverageJobSalary).filter({ salary: { $gt: 1500 } });

Which would result in the query:

SELECT *
FROM (
    SELECT job, AVG(salary) AS salary
    FROM contract
    GROUP BY job
) x
WHERE x.salary > 1500

This is of course a very simple example where you might as well just have used a query lift, but for more complex subqueries which you often use as if they are entities I think this would be valuable

5. A way to find one entity (instead of one row) Currently, findOne will find at most one row. This is troublesome in situation where you want to load one instance of some entity, and load all its references.

E.g:

@entity.name('book')
class Book {
  @t.primary.autoIncrement id: number = 0;
  @t.reference() bookShelf?: BookShelf;
}

@entity.name('bookshelf')
class Bookshelf {
  @t.primary.autoIncrement id: number = 0;
  @t.array(() => Book).backReference() books?: Book[]
}
Say we have the following database: book id bookShelf
0 0
1 0
2 0
bookshelf id
0

Now observe how findOne works

const shelf = await database.query(Bookshelf).filter({ id: 0 }).innerJoinWith('books').findOne();
shelf; // Bookshelf { id: 0, books: [ Book { id: 0, ... } ] }

Since at most one row is fetched, at most one joined row is returned. This makes it troublesome to get at most one entity, and all entities related to that entity.

A new findOneEntity method that limits the amount of instances of the queried entity to 1, instead of limiting the amount of rows would resolve this:

const shelf = await database.query(Bookshelf).filter({ id: 0 }).innerJoinWith('books').findOneEntity();
shelf; // Bookshelf { id: 0, books: [ Book { id: 0, ... }, Book { id: 1, ... }, Book { id: 2, ... } ] }

6. Log Tables A log table is a table that tracks the changes happening to some data. Each row will descibe some action: eg an insert/delete/update

7. Enums in database Currently enums only live in the code, not in the database. It could be nice to create these enums in the database too: https://www.postgresql.org/docs/9.1/datatype-enum.html

I will add more things to this issue if I think of more things

marcus-sa commented 2 years ago

I'd love to see support for log tables & row locking.

marcj commented 1 year ago

A LogPlugin to log table changes (insert, update, delete) has been added in https://github.com/deepkit/deepkit-framework/commit/d306ac5c88f74ee6436f6f69bf29c42e766cee7b

marcj commented 1 year ago

To have a better overview of this issue:


marcj commented 1 year ago

I think that we should not anything to the query API that SQL is capable of, that would be way too much. We can surely add here and there new features to make slightly more complex query working, but I also thought about adding better raw SQL query supported. Maybe something like

const rows = await database.sqlQuery<MyInterface[]>(`SELECT * FROM my_table`)

so that the result is automatically deserialised to MyInterface, even if MyInterface is not even an entity class. This would allow having type safety with arbitrary complex SQL queries.

marcj commented 10 months ago

this is implemented, see https://deepkit.io/documentation/orm/raw-access