BlinkID / blinkid-in-browser

BlinkID In-browser SDK for WebAssembly-enabled browsers.
https://microblink.com/blinkid
62 stars 30 forks source link

Bug: Blink ID v5.x.x and v6.x.x 'Out of memory' issues #108

Closed jamesbarrett95 closed 3 months ago

jamesbarrett95 commented 1 year ago

Problem statement:

Since upgrading to version 6.x.x and testing on 5.x.x, our application alternates between receiving the following errors after the BlinkIdMultiSideRecognizer and video stream have been initialized successfully

Error 1:

ERROR Error: Uncaught (in promise): RangeError: Failed to execute 'getImageData' on 'CanvasRenderingContext2D': Out of memory at ImageData creation
RangeError: Failed to execute 'getImageData' on 'CanvasRenderingContext2D': Out of memory at ImageData creation

Error 2:

Failed to execute 'postMessage' on 'Worker', Data cannot be cloned, out of memory 

Stack trace:

ERROR Error: Uncaught (in promise): DataCloneError: Failed to execute 'postMessage' on 'Worker': Data cannot be cloned, out of memory.
Error: Failed to execute 'postMessage' on 'Worker': Data cannot be cloned, out of memory.
    at WasmSDKWorker.postTransferrableMessage (blinkid-sdk.js:568:27)
    at blinkid-sdk.js:445:32
    at new ZoneAwarePromise (zone.js:1429:21)
    at RemoteRecognizerRunner.processImage (blinkid-sdk.js:435:16)
    at blinkid-sdk.js:2040:35
    at new ZoneAwarePromise (zone.js:1429:21)
    at VideoRecognizer.recognitionLoop (blinkid-sdk.js:2029:16)
    at blinkid-sdk.js:2046:34
    at timer (zone.js:2405:41)
    at _ZoneDelegate.invokeTask (zone.js:406:31)
    at resolvePromise (zone.js:1211:31)
    at zone.js:1118:17
    at zone.js:1134:33
    at asyncGeneratorStep (asyncToGenerator.js:6:1)
    at _throw (asyncToGenerator.js:29:1)
    at _ZoneDelegate.invoke (zone.js:372:26)
    at Object.onInvoke (core.mjs:26274:33)
    at _ZoneDelegate.invoke (zone.js:371:52)
    at Zone.run (zone.js:134:43)
    at zone.js:1275:36

Error 3:

Uncaught TypeError: Cannot read properties of null (reading 'action')

Stack trace: at context.onmessage (BlinkIDWasmSDK.worker.min.js:15:14117)

Device information to recreate the error

Mobile Device: Samsung A71 Browser: Chrome v112 Blink ID version: 6.0.1 and 5.20.0

Additional information:

Task Manager memory usage: image

Example of error 2 appearing on Blink ID, tested with Samsung A71 on Chrome image

Example of error 3 appearing on Blink ID, tested with Samsung A71 on Chrome image

eddieavd commented 1 year ago

Hi @jamesbarrett95 After extensive testing we found that the issue is caused by a bug in Chromium causing memory leaks when transferring ImageData to a worker. The bug has been reported to the Chromium team and you can check the status here. We'll follow up with any relevant updates here as well

jamesbarrett95 commented 1 year ago

Hi @eddieavd Thanks for the update. If you receive a resolution timeframe from the Chromium team could you share it here please?

ckeyhd commented 1 year ago

It's true, we also evidenced the same about the implementation in our project. We will be attentive to what can be reported. Thank you very much @jamesbarrett95 for the detail of the error and @eddieavd for the answer. I will also be attentive 😎.

ivancuric commented 1 year ago

We have raised the issue via Twitter.

This issue seems to happen intermittently, and is caused by congestion in the postMessage channel.

As you can see in the repro sample here, there are multiple issues that can occur:

  1. The weirdest one is that sending the original ImageData instance via postMessage cuts the performance in half. Creating a new object that matches the shape of ImageData with the Uint8ClampedArray referenced from the original one massively improves the performance. The same happens if we construct a new ImageData from a predefined ArrayBuffer. Both Chrome and Safari don't like having ImageData instances inside postMessage messages.
  2. The memory leak is caused by the GC not being able to run. This is directly tied to the workload on the messaging channel / worker instance. If you increase the throughput by increasing the size of the ArrayBuffer (eg the resolution in this case), or not using requestAnimationFrame, but instead using setInterval(()=>{},0), the issue is more likely to occur. The same is drastically more likely to happen if we don't wait for the worker to post a message back, signalling that it's done working, as the channel becomes oversaturated, and tasks keep queuing up. Note that waiting doesn't guarantee that the GC will run. Triggering GC manually from the dev tools will release the unused memory.
  3. Memory usage is not an issue by itself. We don't have any handles to the objects, and the browser can run a GC once memory pressure becomes too high. The issue is that in some cases, Chrome isn't able to trigger a GC in time, before the next ImageData needs to be constructed.
  4. The GC issue seems to be tied to the messaging channel itself, or other Chrome internals, not the worker instance. Creating a pool of workers and sending work to them in a round robin fashion, giving each one time to be idle will not fix the issue.
  5. The performance can be amazingly good, or terrible with memory leaks, depending on the state of the browser. Quitting and restarting the whole browser is sometimes necessary, both in Chrome and Safari.

We are currently investigating other approaches to try and mitigate the issue.

eddieavd commented 1 year ago

hey yall, just a quick follow-up, the Chromium team tagged our report as a duplicate of an existing issue. We're expecting updates from their end in the existing report which you can find right here. We'll also be attaching any major updates from the Chromium team in this issue.

jamesbarrett95 commented 1 year ago

Thanks @ivancuric @eddieavd.

In @ivancuric 's post above, it's mentioned that the memory leaks appear in Safari; which isn't a Chromium-based browser.

Has there been a bug raised to Safari's WebKit? Or is there an ongoing investigation to resolve the issue in BlinkID?

ivancuric commented 3 months ago

Should be resolved in BlinkID v6.5.1 and above