dexie / Dexie.js

A Minimalistic Wrapper for IndexedDB
https://dexie.org
Apache License 2.0
11.68k stars 642 forks source link

Bulk query - performance report #1193

Open ladi-j opened 3 years ago

ladi-j commented 3 years ago

Hi there. I've noticed that you've requested real-life performance numbers, so I'd like to share my research when hunting for the problem of our slow queries.

DATABASE SETUP

1 Database loaded in browser, multiple tables with data items ranging from 10 items to 20000. WebPack compiling to ES2017 target. Our web app is built on React + Redux stack, using Sagas, hooks and async selectors. I can share more info if requested, but it's not very important as I've tested just the dexie performance itself (see below)

Selector tested on a table with 447 items. Item type:

export type RawCodingItem = {
  name: string,
  codes_to_meanings: CodesToMeanings, // [code: meaning]
  codes_to_concepts?: CodesToMeanings, // [code: concept]
  display?: DisplayHierarchy // Display order
}

HW & SW

USE CASE

Select multiple coding items via the primary key. We're selecting about 90% of coding items in our case.

CONTROLLED ENVIRONMENT

RESULTS

SINGLE RUN

(only one selector fired)

Promise.all(name => db.codings.where({name})) - 6836 ms -

const codingObject = await Promise.all(names.map(name => db.codings.where({name}).toArray()));
const filteredCodings = codingObject.flat();

bulkGet.toArray().filter() - 2318 ms -

const codingObject = await db.codings.bulkGet(names);
const filteredCodings = codingObject.filter(({name}) => names.includes(name));

Collection.where().anyOf() - 167 ms -

const filteredCodings = await db.codings
  .where("name")
  .anyOf(names)
  .toArray();

Collection.filter().toArray() - 140 ms -

const filteredCodings = await db.codings.filter(({name}) => names.includes(name)).toArray();

Collection.toArray().filter() - 92 ms -

const codingObject = await db.codings.toArray();
const filteredCodings = codingObject.filter(({name}) => names.includes(name));

CONCURRENT RUN

(2 instances of the same selector fired at the same time)

Promise.all(name => db.codings.where({name})) - 14580 / 10851 ms -

bulkGet.toArray().filter() - 5030 / 4262 ms -

Collection.where().anyOf() - 295 / 968 ms -

Collection.filter().toArray() - 263 / 879 ms -

Collection.toArray().filter() - 138 / 774 ms -

CONCLUSION

Getting all the items from dexie with toArray() and then filtering them with JS .filter function was the fastest in both single and double instance selector. Second fastest in both cases was the Collection.filter().toArray() method.

Interesting observation is that running two selectors on the same table at the same time causes huge performance problems, so this might be a good start for everyone struggling with UI freeze. Thanks to this I've actually discovered that we were doing exactly that by accident :D

ladi-j commented 3 years ago

Note: I will add one more performance report tommorow. We have a selector querying 20000 objects and then filtering them. Currently it takes ~500ms for DB query and another ~500ms for JS parsing.

Will look into this further

dfahlander commented 3 years ago

Thanks a lot @Ruskyj! It would be a great benefit if you could share a link to the source code also. I'd like to trace down the reasons of performance issues with Promise.all(name => db.codings.where({name})). In the long term I'd also like to have this test scriptable and make performance reports for both old and future releases.

ladi-j commented 3 years ago

Access to the source code won't be possible unfortunately, as this is a client's project behind NDA. I can share some extra snippets if needed (our Dexie -> Redux selectors for example).

ladi-j commented 3 years ago

@dfahlander I've run another set of tests with much bigger table. I've also tried to upgrade to Dexie 3.0.3, with no measurable effect on performance. If you're interested, I can prepare an example of our selector stack (so I won't break the NDA).

DEXIE Fields table run (19627)

Query for 19559 fields (99%), Querying by secondary key

Same app, same database, same environment and testing riles

Table item:

export type RawFieldItem = {
  // Field detail
  is_hierarchical: boolean,
  is_multi_select: boolean,
  is_primary_key: boolean,
  is_sparse_coding: boolean,
  name: string,
  primary_table_name: string,
  serial_type: FIELD_TYPE,
  title: string,
  type: FIELD_TYPE,

  // Optional keys
  coding_name?: string,
  concept?: string,
  description?: string,
  folder_path?: string[],
  linkout?: string,
  longitudinal_axis_type?: "primary" | "secondary"
  units?: string,
  mapping?: {
    column: string,
    column_sql_type: string,
    database_id: string,
    database_name: string,
    database_unique_name: string,
    table: string,
  },
  referenced_entity_field?: {
    entity: string,
    field: string,
    database_id: string,
    database_name: string,
    database_unique_name: string,
    table: string,
    column: string,
  }
  stats?: {
    min: number,
    max: number,
    avg: number,
    null_count: number,
  }
}

Results

Promise.all(name => db.codings.where({name})) - N/A ms -

bulkGet.toArray().filter() - N/A ms -

Collection.where().equals() - 423 ms -

Collection.filter().toArray() - 1614 ms -

Collection.toArray().filter() - 339 ms -

CONCLUSION

Similar results as with smaller table. Querying everything and then filtering it with vanilla JS is still the fastest. Interesting change was a big performance drop of Collection.filter method when dealing with bigger table.

dfahlander commented 3 years ago

Number for Collection.where().equals() surprice a bit. But it doesn't say so much without knowing how the code is executed, awaited and whether it's in a transaction or not, how large the queried indexes are, etc.

Thanks anyway for sharing the numbers! We will need to create a performance test for Dexie where we have the entire preconditionset - source and data, and continue from there.

ladi-j commented 3 years ago

@dfahlander I've tried to put our whole selector stack (from DB to React) in this example. Business logic was removed as it's not important for tests. All performance results were performed in the DB selector itself (see comments).

Let me know if this helps you, or I'm doing something really bad here :D

EXAMPLE STACK

// Database constructor
class DeepStateDB extends Dexie {
  public fields!: Dexie.Table<FieldItem, string>;
  public folders!: Dexie.Table<FolderHierachicalItem, DATASET_FIELD_TYPES>;
  public codings!: Dexie.Table<RawCodingItem, string>;
  public config!: Dexie.Table<ConfigItem, CONFIG_TYPES>;
  public entities!: Dexie.Table<EntityItem, string>;

  constructor(recordID) {
    super(recordID);
    const db = this;

    db.version(VERSIONS.DATABASE).stores({
      fields: "[entity+name], database, entity, name",
      folders: "type",
      codings: "name",
      config: "type",
      entities: "name",
    });

    this.fields = this.table("fields");
    this.folders = this.table("folders");
    this.codings = this.table("codings");
    this.config = this.table("config");
    this.entities = this.table("entities");
  }
}

// Database acessor
export const DeepStateRecord = (recordID: string): DeepStateDB => new DeepStateDB(recordID);

// Unviversal selector
async function createDBselector<S extends(...args: any[]) => any>(
  recordID: string | undefined,
  selectorFunc: S,
  ...args
): Promise<ReturnType<S>> {
  if (!recordID) throw new Error("Can't open a database connection without record-id!");

  const db = DeepStateRecord(recordID);
  const selectorResult = await selectorFunc(db, ...args);

  return selectorResult;
}

// Specific combined selector
export async function asyncSelectPrimaryPhenoField(
  db: DeepStateDB,
): Promise<FieldItem | undefined> {
  /* ---------------- PERFORMANCE WAS MEASURED HERE ---------------- */
  const primaryFieldPath = await asyncSelectPrimaryPhenoFieldPath(db);
  const primaryField = await dbSelectSingleField(db, primaryFieldPath);

  return primaryField;
}

// DB selector
export async function dbSelectSingleField(
  db: DeepStateDB,
  fieldPath: FieldID,
): Promise<FieldItem | undefined> {
  const singleField = await db.fields
    .where(["entity+name"])
    .equals(fieldPath)
    .first();

  return singleField;
}

/* ********* EXAMPLES OF USAGE ********* */
// Sagas
function* exampleSaga(recordID) {  
  const result = yield call(createDBselector, recordID, asyncSelectPrimaryPhenoField);
  return result;
}

// Classic async
async function exampleAsync(recordID) {
  const result = await createDBselector(recordID, asyncSelectPrimaryPhenoField);
  return result;
}

// Using custom fork of createSyncSelector from "async-selector-kit" with small changes to redux dispatch
export const [selectResult, selectResultLoading] = createAsyncSelectorWithDispatch(
  {
    async: async(recordID) => createDBselector(recordID, asyncSelectPrimaryPhenoField),
    id: "componentSelectorExample",
  },
  [
    selectRecordID,
  ],
);
dfahlander commented 3 years ago

Is a new Dexie instance created for every query? If so it could explain a longer response time due to the time it takes for the browser to open the database. It is recommended to have a single instance of Dexie in your app so that it only needs to open on page load. It would also explain why parallel queries would be slow as database opening can only be done one at a time.

ladi-j commented 3 years ago

I believe we only create one instance per page mount (as we define it once and then use that instance to access any database selector).

Is there an easy way how to test it/count the instances?

dfahlander commented 3 years ago

The number of instances can be counted using Dexie.connections.length. It's strange that a plain equals- request takes seconds and not milliseconds. In this fiddle we get a completely different performance, so something must be very different.

ladi-j commented 3 years ago

It's possible that our abstraction layer is creating some sort of bottleneck. I'll test the number of connections and report back.

Whole implementation is outlined in the "example" stack comment. Please let me know if you see any possible issues straight away.

muhamedkarajic commented 1 year ago

Any updates on this?

muhamedkarajic commented 1 year ago

I guess this issue is related.