aspen-cloud / triplit

A full-stack, syncing database that runs on both server and client. Pluggable storage (indexeddb, sqlite, durable objects), syncs over websockets, and works with your favorite framework (React, Solid, Vue, Svelte).
https://triplit.dev
GNU Affero General Public License v3.0
2.18k stars 65 forks source link

`HttpClient` does not update fields of type `S.Optional(S.Set(S.String)))` #196

Open MHOOO opened 2 months ago

MHOOO commented 2 months ago

I have a database that I want to migrate. Since I would rather not run the migration on every client, I have used the HttpClient in the backend to perform the table update. However, for some reason, the following function does not actually perform any update when client is a HTTPClient:

                await client.update('videos', video.id, (entity) => {
                    let tag_ids = new Set<string>("some id")
                    entity.tag_ids = tag_ids
                })

If I run the same code in the frontend using a TriplitClient the update is performed successfully. In both cases, the sync server is reached, so for example I can insert elements just fine. For now I'll try to run the migrations on a specific administration page on the frontend, so that multiple clients don't try to migrate the data at the same time.

This is with newest triplit (just ran pnpx up "@triplit/*" today) libraries

pbohlman commented 2 months ago

Thanks for raising this. Are you initializing your HttpClient with your schema? That can be a source of undefined behavior if you're using any client (including the HttpClient).

const client = new HttpClient({serverUrl, token, schema});
MHOOO commented 2 months ago

Oh, I might have forgotten that. I would love to try, but my local DB turned super duper slow after several failed migration attempts. Is there some way to clean up old (i.e. deleted) data from the tuple store?

pbohlman commented 2 months ago

Hmm a couple questions

  1. When you say "local DB" do you mean a in-browser db (e.g. the TriplitClient's cache) or a server db e.g. the SQLite file backing your deployed Triplit sync server?
  2. Are you self-hosting? If so, you might want to try upgrading to the latest server we released last night which introduced some big performance improvements.
MHOOO commented 2 months ago

It might be both:

  1. I'm seeing the slow-ness in the frontend (i.e. browser). Getting rows from an empty tags table takes 3 seconds...
  2. The backend / server-DB using SQLite is 26 MB large, even though I only have a couple hundred entries according to the web console. On it, I'm seeing high numbers of total rows (Showing 0 of < 600), even though all of those are deleted: Screenshot_20240817_192404

Regarding the sync server, yes, I'm self-hosting and I'm on 0.4.6 since I updated just today

MHOOO commented 2 months ago

I ran some tests and indeed the in-browser DB comes to a crawl with too many updates. Is there some way to compact / snapshot the database to get rid of the history?

MHOOO commented 2 months ago

What I did now was to:

  1. Create a second triplit dev sync instance
  2. Query all tables from the old sync instance and copy them with a transaction to the new sync instance

This way I was able to reduce the amount of tuples inside the SQLite DB from ~70k to ~16k. At the same time the IndexedDB cache went from ~16k to ~900 entries. In any case this has improved the performance significantly. Another thing that has helped, is to get rid of .include calls in as many places as possible. However, updating ~100 rows with new tags inside a single transaction while querying at the same time (for existing elements) takes 80 seconds (I'm actually trying to update 250 rows, but this has simply hung up my browser). This appears to be due to query performance. Not sure if there is any fix here or if this is simply the limitation resulting of the tuple db being inside a single table. Any ideas?

pbohlman commented 1 month ago

Thanks for the update.

Writing and then reading, repeatedly, inside a single transaction is going to lead to suboptimal performance (as currently implemented). This is because inside a transaction we defer indexing of the new tuples created by a write operation until either the transaction is finished or until the next read. The interleaved queries you mentioned are very likely slowed down by this behavior.

We've got a couple things on our list this week that should hopefully address some of this. We're currently implementing a normalized cache that should speed up relational queries. And if you can share a reproduction of your slow transaction we'll see what improvements we can find.