djyde / cusdis

lightweight, privacy-friendly alternative to Disqus.
https://cusdis.com
GNU General Public License v3.0
2.61k stars 225 forks source link

Comments query optimization #18

Open djyde opened 3 years ago

djyde commented 3 years ago

Comment's database query (in service/comment.service.ts) is recursive, it would cause a performance issue when the data get bigger. Should find a better way.

Related:

https://news.ycombinator.com/item?id=26885342 https://cra.mr/2010/05/30/scaling-threaded-comments-on-django-at-disqus/

promer94 commented 3 years ago

Problem

The comments trees are currently stored in the form of the Adjacency List. Query the whole tree would be recursive which might cause performance issue in the future.

Possible solution

The most popular model for hierarchical data in relational database would be the Closure Table.

It uses an extra table to store every path from each node to each of its descendants. Query children would be more efficient.

But the prisma does not implement such model currently. If we implement this by ourself, it may have some conflicts in the future.

I suggest we could have 'poor-man' version of Closure Table.

model Comment {
 /** ... */
 ancestorId String?
}

Simply adding a field pointing to the ancestor would solve the current query and pagination problems. By doing this, finding the root comments and all the children would only use two query in general.

Things need to change

  1. Datebase

    • Run new migration
    • Update the all comments with new ancestorId (might need a script for self-host users)
  2. Code

    • Deprecate current comment query method and add new comments query method

FYI: https://www.slideshare.net/billkarwin/models-for-hierarchical-data