CardanoSolutions / kupo

🐹 Fast, lightweight & configurable chain-index for Cardano.
https://cardanosolutions.github.io/kupo/
Mozilla Public License 2.0
118 stars 29 forks source link

Question: moving away from sqlite to a more performant DB? #146

Closed klntsky closed 10 months ago

klntsky commented 10 months ago

What is your idea? Provide a use case.

In light of #131 which we also observed in production, I really want to ask a question: why is sqlite used as database backend for Kupo?

Do I understand it correctly that the need to rebuild the index after rollbacks is due to sqlite being used? We are considering forking kupo with the goal of replacing sqlite with postgres, what do you think?

xray-robot commented 10 months ago

You probably need to read this thread for a follow-up discussion: https://github.com/CardanoSolutions/kupo/discussions/110

KtorZ commented 10 months ago

@klntsky: why is sqlite used as database backend for Kupo?

Because it works, it's embedded and has great performances. So Kupo exists as a single black-box with no dependency. It's simple to install, simple to use and has very decent performances. Also, Kupo has always been meant as a backend service for single-dapps, but not necessarily to be used to power an entire multi-dapps infrastructure. The idea is that it's a suitable choice to embed alongside your dapp backend, but clearly not meant for servicing an API query layer for millions of users.

@klntsky: Do I understand it correctly that the need to rebuild the index after rollbacks is due to sqlite being used?

I am not sure where this is coming from? Do you mean, the index as in 'SQLite table indexes' or as in 'Kupo'? If the former, then, it's true that until 2.5.0, Kupo used to create some indexes on-the-fly for rollbacks but it was proven to lead to a lot of unnecessary I/O operations and indexes are now created only once during the initial start-up.

@klntsky: considering forking kupo with the goal of replacing sqlite with postgres

Not much interest. One of the design goal of Kupo is to be simple. I've flirted with the idea of supporting multi-backends but in my experience, it has never really worked well for projects. It adds complexity without much benefits. If you need something where you want to pick your own backend solution, I'd recommend looking into solutions like Oura. Kupo is opinionated and that's a conscious choice.

Also, I am not sure why many people hate SQLite this much and see it as a "poor choice" for a backend. In particular, we have done benchmarks against PostgreSQL (cardano-db-sync's model) for a variety of queries and Kupo w/ SQLite was one or two order of magnitudes faster in pretty much all cases.

One solution that I want to explore, however, is ducksdb for being also a simple and embedded database engine; yet a columnar one. For the type of queries that are performed by Kupo, this could work quite fantastically.

biofreeze commented 10 months ago

The idea is that it's a suitable choice to embed alongside your dapp backend, but clearly not meant for servicing an API query layer for millions of users.

@KtorZ I was thinking of using Fly.io to spin up VM's very quickly. I figure I can download an almost 100% synced Kupo snapshot that I've created, and start servicing queries pretty quickly? I should be able to use Fly.io to horizontally scale Kupo queries? By using snapshots, I should be able to start accepting queries within only a few minutes.

Thoughts? 😃

biofreeze commented 10 months ago

Also @KtorZ any thoughts about allowing a remote connection string instead of workdir? I've been exploring https://turso.tech which I believe uses SQLite under the hood 👍

uhbif19 commented 10 months ago

I've flirted with the idea of supporting multi-backends but in my experience, it has never really worked well for projects.

Why? SQL has a known standard, and some subset of it is supported by all major DB engines. Do you use some SQLite-specific features?