Re-evaluate DB - Githubissues

Saghen commented 2 weeks ago

Using both Mongo and MeiliSearch definitely isn't ideal. We should either pick a search indexer or postgres, but the choice isn't obvious.

Data fits naturally into a relational model author -> work -> edition <- series
Need full text search
Servarr team familiar with Postgres
Higher throughput could be achieved through denormalization
- Q: are we preemptively optimizing a performance issue that doesn't exist? could caching alleviate the majority of load here? need benchmarks

blampe's $0.02:

it's a relational db with fts bolted on, and we just need a search index here. the lookups by isbn are just themselves just search queries for an exact field:value match. once you need to tweak ranking and deal with scaling stuff horizontally pg just doesn't have the knobs you want.

you also really, really don't want to be joining anything in the query path, so it makes me nervous to stick a relational db there. the query should return as much denormalized data as possible. right now it returns almost nothing, so the client needs to go back and fetch metadata for individual books, which is horribly slow and a ton of unnecessary load on the server.

there's always a tradeoff between index time and query time, and pg is great because it strikes a balance between those two. but with search, you want to spend significantly more time indexing in order to keep query time as small as possible. the hard part here is actually getting the indexing right to serve up all the denormalized data. practically anything can serve up the data, it's less important.

kieran's $0.02:

I would argue that this data is inherently relational. Returning all books belonging to an author, returning all the genres a book belongs to, essentially everything except the search endpoint. I understand that Meili is great at denormalizing all of that and serving it up all the same, but I would question if the additional complexity can actually be justified by the expected throughput.

Meili (and ES) are much, much better at FTS if you spend the time tuning them correctly. But I would question what fraction of app operations are actually dependent on FTS vs working with the concrete IDs that postgres excels at. second to that, it's clear the readarr team has competencies in postgres so I would argue there's value in sticking with what they're familiar with so ongoing maintenance is simplified

thezak48 commented 2 weeks ago

Currently postgres is what readarr meta uses, and well it is also what readarr itself can use if you enable it

Saghen commented 2 weeks ago

As discussed on the discord, postgres it is!

Saghen / open-library-proxy

Re-evaluate DB #3