famedly / matrix-dart-sdk

Matrix SDK written in pure Dart.
GNU Affero General Public License v3.0
55 stars 30 forks source link

Replace Hive database again #239

Closed famedly-bot closed 1 year ago

famedly-bot commented 2 years ago

In GitLab by @krille-chan on Nov 5, 2021, 10:16

The switch from Moor to Hive we did some months ago was together with the hope that we can get rid of database corruptions and very slow performance on web. In fact the SQflite developer said that he doesn't want to port his package to web because there is no good (and non-deprecated) sql web solution. Moor on the other hand was a very hacky solution with poor performance.

At this time Hive seemed to be the better option. It has a very easy to use API and a blazing fast performance. Unfortunately the performance on web is still not good enough!

After some research we came to the conclusion that it depends on how Hive uses IndexedDB as a backend. The API of IDB uses transactions everywhere while Hive does not know transactions at all. This leads to the problem that Hive on web uses one transaction per operation which explains the slow performance. Also Hive uses one database per box instead of one ObjectStore in a single database.

Also Hive seems to be not well maintained anymore. The last update is from Apr 19 and the developer wants to focus on a new database solution called Isar. But it is unclear if Isar can fix these IndexedDB problems. (Also the last commit on Isar was in May so the future is unclear as well...)

Hydrogen shows us that IDB can have a very impressive performance if it is used correctly. Unfortunately it seems not to be ṕossible to fix the Hive port for web because the APIs just do not fit. Therefore a switch to native IDB while using transactions in a smart way seems to be the fastest solution for web.

But what about mobile and desktop? Once we have developed a new pure IDB database for our SDK, we can use it with the idb_shim package on all other platforms as well. We can also use idb_sqflite for a sql based implementation, which makes it possible to use sqlcipher again for encryption. That's what I, from my researches, currently would recommend as the way to go.

idb_shim and idb_sqflite are packages made by the developer of sqflite. They are actively maintained since years and the chances that they have a bright future are much higher.

We now have performance benchmarks in our SDK start and I also would like to add some benchmarks to sync processing and room loading as well.

famedly-bot commented 2 years ago

In GitLab by @krille-chan on Nov 5, 2021, 11:13

Mapping Hive API to IndexedDB

Our first thought was to write a better IDB implementation of the Hive API. But while trying this I ran into these problems:

Create a new box

In Hive boxes are created using:

Hive.openBox('boxname'); // For normal boxes
Hive.openLazyBox('boxname'); // For lazy boxes

So boxes are created one after another. But in IDB you create ObjectStores normally all at once at the start:

// define the store name
const String storeName = "records";

// open the database
Database db = await idbFactory.open("my_records.db", version: 1,
    onUpgradeNeeded: (VersionChangeEvent event) {
  Database db = event.database;
  // create the store
  db.createObjectStore('storeName', autoIncrement: true);
  db.createObjectStore('storeName2', autoIncrement: true);
});

Hive has solved this by using a database per box and not an ObjectStore. But this looks very bad in the browser tab and probably has poor performance. It needs some time to create 16 databases while creating 16 ObjectStores in one database creation process is much faster.

Read data from box

In Hive you just use:

final value = await box.get('key');

While in IDB everything is inside of a transaction.

final txn = db.transaction(storeName, "readonly");
final store = txn.objectStore(storeName);
final  value = await store.getObject('key');
await txn.completed;

While it makes no difference in performance for reading one entry, it makes a HUGE difference when trying to read thousands of entries like we do on app start. However Hive seems to use some transactions in the background, dynamically created, otherwise it would be much more slower. (I experienced this while trying to write my own IDB store with simplified get and put methods). But it is still not optimal and can't be because we need to define such transactions in the SDK.

Write data in box

Similar to get data:

await box.put('key', value);

While in IDB everything is inside of a transaction.

final txn = db.transaction(storeName, "readwrite");
final store = txn.objectStore(storeName);
final key = await store.put({"some": "data"});
await txn.completed;

This seems to have the worst performance on web so I think that Hive uses a single transaction per PUT. We could improve our sync processing a lot my using a single transaction per sync.

famedly-bot commented 2 years ago

In GitLab by @krille-chan on Nov 5, 2021, 11:22

Overview of other database solutions

SQFlite

Moor

Hive

Isar

LocalStorage

ObjectBox

Sembast

IndexedDB

famedly-bot commented 2 years ago

In GitLab by @krille-chan on Nov 9, 2021, 13:25

Just to mention I have tested something:

It is possible to run multiple read actions in parallel so we can speed up read actions with Hive a little bit. This makes it possible to speed up the app start.

However it is not possible to speed up write actions in the same way. I have tried this in the web app:

          buttonElement(
            text: 'TEST',
            onClick: (_) async {
              final box = await Hive.openLazyBox('testbox');
              await box.clear();
              runBenchmarked('Hive speed test', () async {
                for (var i = 0; i < 100; i++) {
                  await box.put('key$i', {'foo': i});
                }
                return;
              }, 100);
            },
          ),
          buttonElement(
            text: 'TEST 2',
            onClick: (_) async {
              final box = await Hive.openLazyBox('testbox');
              await box.clear();
              runBenchmarked('Hive speed test 2', () async {
                final futures = <Future>[];
                for (var i = 0; i < 100; i++) {
                  futures.add(box.put('key$i', {'foo': i}));
                }
                await Future.wait(futures);
                return;
              }, 100);
            },
          ),

And there was no difference in the benchmarks. Both methods needed around 2 seconds which is way too slow.

famedly-bot commented 2 years ago

In GitLab by @krille-chan on Nov 16, 2021, 10:42

mentioned in commit 05329740224fe14fb7fb6fa3c8b950c7658846e7