Poor initial fetch performance with IndexedDB storage backend

MHOOO commented 1 month ago

Hi there,

I'm using triplit to store around 400 rows of data, and I am seeing a client-side fetch time of around 2000ms when fetching for the first time with an IndexedDB backend. If I use 'memory' as the storage backend and simply pull the data from the server, I get around 250ms. Both client & triplit server run on the same device, and the triplit server is already spun up at the time that the requests hit it. However, I was thinking the local IndexedDB should still be faster. Is there some way to improve the performance?

Here's my schema for reference (I've already tried to remove all relations, but those don't seem to have any impact):

const videoCropOptions = S.Record({
    // crop from top (e.g. "10%")
    top: S.Optional(S.String()),
    // crop from bottom (e.g. "10%")
    bottom: S.Optional(S.String())
});
const videoOffsetOptions = S.Record({
    // offset from start in [s]
    start_offset: S.Optional(S.Number()),
    // offset from end in [s]
    end_offset: S.Optional(S.Number())
});
const videoOptions = S.Record({
    crop: videoCropOptions,
    offset: videoOffsetOptions
});
export const schema = {
    // Basic user information. No password / auth support yet
    users: {
        schema: S.Schema({
            id: S.Id(),
            email: S.String(),
            created_at: S.Date({ default: S.Default.now() })
        })
    },
    // Id used to mark duplicate videos
    video_duplicates: {
        schema: S.Schema({
            id: S.Id(),
            videos: S.RelationMany('videos', {
                // NOTE: Apparently $id is only supported on the right side of the equation
                where: [['video_duplicates_id', '=', '$id']]
            })
        })
    },
    loops: {
        schema: S.Schema({
            id: S.Id(),
            created_at: S.Date({ default: S.Default.now() }),

            // owner can update / change / delete
            owner_ids: S.Set(S.String()),
            owners: S.RelationMany('users', {
                where: [['id', 'in', '$owner_ids']]
            }),

            video_id: S.String(),
            video: S.RelationById('videos', '$video_id'),

            // order with respect to other loops on the same video
            order: S.Optional(S.Number()),
            // when not set, will generate one
            title: S.Optional(S.String()),
            // when it starts in [s]
            start: S.Number(),
            // when it ends in [s]
            end: S.Number()
        })
    },
    videos: {
        schema: S.Schema({
            id: S.Id(),
            title: S.String(),
            url: S.String(),
            // e.g. ["Kizomba" "lvl/intermediate"]
            tags: S.Set(S.String()),
            created_at: S.Date({ default: S.Default.now() }),

            // owner can update / change / delete
            owner_ids: S.Set(S.String()),
            owners: S.RelationMany('users', {
                where: [['id', 'in', '$owner_ids']]
            }),

            // duplicate videos share the same ID here
            video_duplicates_id: S.Optional(S.String()),
            video_duplicates: S.RelationOne('video_duplicates', {
                where: [['id', '=', '$video_duplicates_id']]
            }),

            loops: S.RelationMany('loops', {
                where: [['video_id', '=', '$id']]
            }),

            // options for video display
            options: videoOptions
        })
    }
} satisfies ClientSchema;

matlin commented 1 month ago

Which version of the Client are you using? This is definitely much slower than expected.

You can also try paginating the data (i.e. add a limit) in the meantime while we investigate your issue.

MHOOO commented 1 month ago

Adding .limit(1) does not impact the performance in any way. Matter of fact, I can query my users document, which only has a single user inside of it, and still get this behaviour. My dependencies are:

        "@triplit/cli": "^0.4.23",
        "@triplit/client": "^0.3.83",
        "@triplit/svelte": "^0.1.55",

I've measured the performance within chromium and I'm seeing a lot of idle time (see timings in the getVideos region): triplit_performance_indexeddb

There are many calls to IndexedDbTupleStorage.scan, but in-between in looks like nothing is happening at all. Could this be IndexedDB transactions slowing everything down?

matlin commented 1 month ago

Thanks for sharing all this context and digging in on your end. One last thing, do you mind sharing the query you're using and your client setup?

It shouldn't be an issue with IndexedDB transactions because the default indexeddb storage provider uses a memory cache so, once the data is loaded, reads should be about as fast as the memory storage provider.

For example, I just tested loading around 1000 entities and it finished in 45ms and with limit(1) it finished in 3ms.

MHOOO commented 1 month ago

The client setup is:

import { TriplitClient } from '@triplit/client';
import { browser } from '$app/environment';
import { schema } from '$lib/triplit/schema';

let _client: TriplitClient | undefined = undefined;

export const triplit = (domain: string | undefined) => {
    if (!domain) domain = import.meta.env.VITE_DOMAIN;
    console.time('Setup TriplitClient');
    _client = new TriplitClient({
        logLevel: 'debug',
        schema: schema,
        serverUrl: `https://${domain}:8087`,
        token: import.meta.env.VITE_TRIPLIT_TOKEN,
        autoConnect: browser,
        storage: browser ? 'indexeddb' : undefined
    });
    console.timeEnd('Setup TriplitClient');
    return _client;
};

and the query (w/o any limits):

    let client = triplit(domain);
    let options = { policy: 'local-first' };

    console.time('Fetch users');
    const user = await client.fetch(client.query('users', options).select(['id']).build());
    console.timeEnd('Fetch users');
    console.time('Fetch videos');
    const videos = await client.fetch(client.query('videos', options).build());
    console.timeEnd('Fetch videos');

I'll do some testing later today with a smaller schema. Maybe I can boil the problem down somehow. Thanks for sharing your insights so far!

matlin commented 1 month ago

Also feel free to share an export of your performance trace and I'll take a look. Best to keep screenshots turned of though otherwise the file ends up being huge.

MHOOO commented 1 month ago

I created a static HTML that reproduces the issue. It simply generates dummy data with above schema (400 rows) when first opening the page and then fetches data from the database on subsequent page refreshes. I'm seeing 1400ms on first fetch and 60ms thereafter. Appended the HTML + a performance graph of the first fetch

triplit-tests-export.tar.gz

MHOOO commented 1 month ago

Ran some experiments on the schema. I guess the initial decoding / rebuilding of the indexeddb data on my computer with 400 rows takes per schema field:

around 100ms for any simple (string, date, number) field
around 300ms for sets

Records are then a conjunction of these times. I'm using:

Google Chrome Version 126.0.6478.182
CPU: AMD Ryzen 7 5800H (8 cores)
64GB RAM

matlin commented 1 month ago

Thanks again for providing such a useful (and easy!) reproduction. I think the fundamental issue is that IndexedDB is pretty slow and apparently has performance issues on Windows -- on my M1 Macbook Pro, it's about twice as fast to run the initial query. However, I think this should be surmountable and I'll experiment with some things to cut down on that initial loading time.

For context, while the memory cache is being initialized from IndexedDB, it will still resolve queries just by accessing IndexedDB directly and I'm wondering if the simultaneous requests are bogging things down. I.e. it may very well be faster to just block until the memory cache is filled and then start serving queries rather than allowing those initial queries to create concurrent read transactions while the cache loads.

MHOOO commented 1 month ago

Hmm, I'm on linux (NixOS 24.05), but if it really is the performance of IndexedDB that would be quite problematic, since 400 rows / entries are really quite easy to get even as a single user. In my use case I get a very noticeable lag on my mobile phone when loading the db. I took a look at the existing storage implementations and most of them seem pretty straightforward. Do you think it would be easy to implement e.g. a https://github.com/rhashimoto/wa-sqlite backend to get around the issue or is this likely more of a problem in the overall design of the backend API (i.e. scan vs scanAll)?

matlin commented 1 month ago

I think creating an adapter for wa-sqlite should be pretty straightforward.

I'll also test out our non-cached IndexedDB storage adapter and see how it compares for that initial fetch. But just to confirm, your queries are fast after the initial load right?

MHOOO commented 1 month ago

Oh thats nice to hear. I wanted to try out integrating a wasm based module anyway, if it works out, I might try to write an adapter for it. Yes, after the initial load they are what I would expect (in the 50ms range)

MHOOO commented 1 month ago

Tried to hack sqlocal support into @triplit/tuple-database but I sort of got stuck getting vite to sucessfully build the local checkout of the dependency. After fixing a typo:

modified   src/storage/ExpoSQLiteStorage.ts
@@ -4,7 +4,7 @@ import {
    AsyncSQLiteExecutor,
    AsyncAdapterSQLiteStorage,
    AdapterSQLiteOptions,
-} from "./AdapterSQLiteStorage"
+} from "./AdapterSqliteStorage"
 import { AsyncTupleStorageApi } from "../database/async/asyncTypes"

I was able to build tuple-database, and have NPM use that via an override. However, when building the static HTML with vite, I get errors when trying to import from the build directory of @triplit/tuple-database. At this point, I fear my node / vite-fu is far too basic. Do you maybe have a simple setup that I might use?

matlin commented 1 month ago

We also added an optimization to the initialization process that should help your initial query situation. So might just check if the latest version of the client is faster for you.

wernst commented 1 month ago

@MHOOO Looking at our code for @triplit/tuple-database it looks like that import is correct. What version are you referencing? The latest should be 2.2.25

MHOOO commented 1 month ago

We also added an optimization to the initialization process that should help your initial query situation. So might just check if the latest version of the client is faster for you.

I can confirm that it is much faster now: It only takes ~1000ms to load all rows and if I limit the query to one entry, the initial fetch only takes 170ms, which is way better, since I can easily load additional rows afterwards. Thanks for the fix! I'll still try to get sqlocal running somehow. I've had minor success with simply editing the files inside the node_modules folder (hack, hack, hack ^^)

@MHOOO Looking at our code for @triplit/tuple-database it looks like that import is correct. What version are you referencing? The latest should be 2.2.25

That is exactly the version that I am using. I think the problem may be Windows vs Linux: on Linux file names are case-sensitive, whereas on Windows they are not:

MHOOO commented 1 month ago

Built a simple sqlocal storage backend. It does appear to be faster for fetching initially, however without a memory cache, subsequent fetches are slower than with the current indexedb impl. Also, inserts for some reason are really slow... not sure why. I can't really test it with my local tuple-database checkout, so here's just the adapter class if you're interested in having it:

"use strict"
const SQLocal = require("sqlocal").SQLocal

Object.defineProperty(exports, "__esModule", { value: true })
exports.SqlocalStorage = void 0
const codec_1 = require("../helpers/codec")
class SqlocalStorage {
    constructor(dbName) {
        this.dbName = dbName + ".sqlite3"
        const { sql, transaction, batch, destroy } = new SQLocal(this.dbName)
        console.log("Creating db: " + this.dbName)
        this.sql = sql
        // this.sql`drop table data`
        this.transaction = transaction
        this.batch = batch
        this.close = destroy
        // custom delimiter, since \x00 does not work with sqlocal
        this.encodeTuple = (tuple) => {
            return (0, codec_1.encodeTuple)(tuple, { delimiter: "\x02" })
        }
        this.decodeTuple = (str) => {
            return (0, codec_1.decodeTuple)(str, { delimiter: "\x02" })
        }
        this.scan = async (args = {}) => {
            // Bounds.
            let start = args.gte ? this.encodeTuple(args.gte) : undefined
            let startAfter = args.gt ? this.encodeTuple(args.gt) : undefined
            let end = args.lte ? this.encodeTuple(args.lte) : undefined
            let endBefore = args.lt ? this.encodeTuple(args.lt) : undefined
            const sqlArgs = [start, startAfter, end, endBefore, args?.limit].filter(
                Boolean
            )
            // console.log("args:", sqlArgs)
            const where = [
                start ? `key >= $start` : undefined,
                startAfter ? `key > $startAfter` : undefined,
                end ? `key <= $end` : undefined,
                endBefore ? `key < $endBefore` : undefined,
            ]
                .filter(Boolean)
                .join(" and ")
            let sqlQuery = `select * from data`
            if (where) {
                sqlQuery += " where "
                sqlQuery += where
            }
            sqlQuery += " order by key"
            if (args.reverse) {
                sqlQuery += " desc"
            }
            if (args.limit) {
                sqlQuery += ` limit $limit`
            }
            const results = await this.sql(sqlQuery, ...sqlArgs)
            return results.map(
                // @ts-ignore
                ({ key, value }) => ({
                    key: this.decodeTuple(key),
                    value: JSON.parse(value),
                })
            )
        }
        this.commit = async ({ set: inserts, remove: deletes }) => {
            // console.log("commit", inserts, deletes);
            await this.batch((sql) =>
                inserts
                    .map(
                        ({ key, value }) =>
                            sql`insert or replace into data values (${this.encodeTuple(key)}, ${JSON.stringify(value)})`
                    )
                    .concat(
                        deletes.map(
                            (tuple) =>
                                sql`delete from data where key = ${this.encodeTuple(tuple)}`
                        )
                    )
            )
        }
        this
            .sql`create table if not exists data ( key text primary key, value text)`
    }
}
exports.SqlocalStorage = SqlocalStorage
//# sourceMappingURL=../../src/storage/SqlocalStorage.js.map

matlin commented 1 month ago

I'm going to close this regarding initial fetching performance with IndexedDB. We'll continue optimizing the IndexedDB performance overall but you can re-open if you see any other issues crop up.

If you want to re-open an issue or create a pull request for your sqlite-wasm adapter, you can make one in our tuple-database repo.

aspen-cloud / triplit

Poor initial fetch performance with IndexedDB storage backend #182