dexie / Dexie.js

A Minimalistic Wrapper for IndexedDB
https://dexie.org
Apache License 2.0
11.55k stars 642 forks source link

Dexie appears to be unable to find Data in IndexedDB for one user #1807

Closed TravisBumgarner closed 6 months ago

TravisBumgarner commented 1 year ago

The code below is a simplified version of what is actually going on. It is in production, has been in production for several months. In the last 30 days, 1 user has experienced this issue. 126 other users have successfully used it. That user is running the latest version of Chrome on a Macbook laptop.

For context, I store videos in IndexedDB with the following shape. Metadata about the video is stored in IVideoMetadata. The raw data that makes up a video is stored in IVideoData. When a video is created, there is a videoId that ties both tables together.

export interface IVideoData {
  videoId: string;
  index: number;
  blob: Blob;
  uploadingStatus: 'uploading' | 'pending' | 'uploaded';
}

export interface IVideoMetadata {
  title: string;
  videoId: string;
  uploadingStatus: 'uploading' | 'pending';
}

I have built an uploader for uploading these videos. I have removed most of the code that isn't relevant. This code is initiated inside of a SharedWorker. It does the following

  1. On startup, check if there are any videos that haven't been fully uploaded yet and enqueue them for uploading.
  2. Observe IndexedDB and wait for new video data to be populated there. Subscribe to this observer
  3. When new video data is available, enqueue it for uploading.
export const getVideoDataThatCanBeUploaded = async () => {
  const videoMetadata = await db.videoMetadata.where({ uploadingStatus: 'uploading' });

  const videoData = (
    await Promise.all(
      videoMetadata.map(({ videoId }) => {
        return db.videoData.where({ uploadingStatus: 'pending', videoId }).toArray();
      }),
    )
  ).flat();
  return videoData;
};

export default class Uploader {
  private observable;

  constructor() {
    this.observable = liveQuery(async () => {
      return await getVideoDataThatCanBeUploaded();
    });

    this.observable.subscribe({
      next: async (videoData) => {
        this.enque(videoData);
      },
      error: (error) => console.log(error.message),
    });
  }

  private async enque(videoData) {
    await Promise.all(
      videoData.map(async ({ videoId }) => {
        const videoMetadata = await db.videoMetadata.where({ videoId }).first();
        if (!videoMetadata) {
          console.log(`Couldn't find specified videoMetadata key, ${videoId}`);
          return;
        }
        // Code removed
      }),
    );
  }
}

The issue for the user is that they get the error Couldn't find specified videoMetadata key, abc123 However, when they open IndexedDB, they see the expected entries for videoId abc123 in both the videoData and videoMetadata. When they delete IndexedDB, uploading starts working again just fine. However, at some point, their IndexedDB gets into a state such that this error starts occurring again. The only way to fix it is to then clear IndexedDB again.

dfahlander commented 1 year ago

One issue with the code is that the observable keeps producing things to enque on any type of change and if more than a single item is in the queue, items might be sent to enque several times. As I guess, this could be normal behavior since the observable might produce same results several times so the same data is enqueued several times. For example, I suppose at some point, you will be setting uploadingStatus to 'uploaded'. This will trigger the observable again. Let's say you had two items in the queue, but only the first was set to 'uploaded', then the second item will be sent to enque again.

We could see the enque method as a consumer and I think it needs to start a 'rw' transaction and not trust the observable to be accurate in any way but recheck every status and not leave the tranasaction until the status has been updated. The observable will only function as a signal and making the consumer to be eager. Things might happen outside transactions so the consumer needs redo some of the job of the observable again within the transaction, and update the status. I could try giving an example as this would be a typical patter for producer/consumer with dexie but I'm in a hurry for a meeting. Hope you find a solution based on this !

dfahlander commented 1 year ago

export default class Uploader {
  private subscription: Subscription;
  private isWorking = false;

  constructor() {
    const observable = liveQuery(() => db.videoMetadata.where({ uploadingStatus: 'uploading' }).count());
    this.subscription = observable.subscribe(numNewItems => {
      if (numNewItems > 0) {
        this.work().catch(error => console.error('Worker failed', error));
      }
    });
  }

  private async work() {
    if (this.isWorking) return; // Let worker only execute one at a time. If it returns here, it means it's in the loop and will check for itself if there were new items added anyway.
    this.isWorking = true;
    try {
      // Let the worker loop until the queue is empty:
      let videoMetadata = await db.videoMetadata.where({ uploadingStatus: 'uploading' }).toArray();
      while(videoMetadata.length > 0) {
        await Promise.all(
          videoMetadata.map(async ({ videoId }) => {
            const data = await db.videoData.where({ videoId }).toArray();
            await fetch(...); // Upload it
            await db.videoMetadata.update(data.videoId, { uploadingStatus: 'uploaded' }); // Only keep uploadingStatus on metadata because updating a row with a blob is time consuming in indexdDB.
          }),
        );
       // recheck statuses. Things might have arrived when we were busy uploading.
        videoMetadata = await db.videoMetadata.where({ uploadingStatus: 'uploading' }).toArray(); 
      }
    } finally {
      this.isWorking = false;
    }
  }

  stop() {
    this.subscription.unsubscribe();
  }
}

Well, no need for a transaction, but avoid race conditions by letting the worker run one at a time and only keep uploadingStatus on metaData. I've seen that IndexedDB is slow on updating rows that have a large blob property, so if the videoData can only be added and never updated status on, it would be more performant.

Also make sure when enquing the upload, to first add the videodata and then the videometadata to trigger the uploader service to upload it.

TravisBumgarner commented 1 year ago

Hey - Just wanted to say thanks for your thoughtful reply. I need to put this task on the back burner while I focus on some other work but I will circle back around in a month or so when I get time to tackle these issues and I'll give an update on if your suggestions address the issue.

TravisBumgarner commented 6 months ago

Thanks again for maintaining such an awesome library!

Just wanted to circle back around to close this issue.

Your initial comment about things being queued multiple times was correct. I was ending up in a race condition where things were colliding into each other in bad ways. I've rewritten my code such that it doesn't happen anymore.

dfahlander commented 6 months ago

Thanks for circling back! It's great to hear that you found a way forward and I'm glad that my answer lead you in the right direction. I suppose this type of question could be relevant to other people dealing with producers / consumers and blobs so I added the question label to make it part of our Q/A.