aws / amazon-chime-sdk-js

A JavaScript client library for integrating multi-party communications powered by the Amazon Chime service.
Apache License 2.0
717 stars 472 forks source link

[Feature request] Binarization support for Background Filter mask #1973

Open maruware opened 2 years ago

maruware commented 2 years ago

Community Note

Tell us about your request

What do you want us to build?

amazon-chime-sdk-js

Tell us about the problem you are trying to solve and why is it hard?

We are using Background Blur and Background Replacement. It's very useful, but since the mask becomes translucent depending on the likelihood, there is a problem that the real background looks subtle and there is flicker.

How are you currently solving a problem?

We tried to solve it by applying a patch that binarizes the mask with a fixed threshold and the result improved. However, We don't think the library should have a fixed threshold.

We think it might be a good idea to do this.

Additional context

Anything else we should know?

richhx commented 2 years ago

Thanks for the suggestion! I don't know how this slipped out of our queue and hasn't been responded to yet so apologies for the late reply. To clarify the request, are you asking for us to add a parameter to the processors (e.g. binaryThreshold) and if the parameter is passed, you would like us to convert the mask internally to not be a range but instead be fixed values?

For example, if binaryThreshold = 0.5 and the current range of the mask is [0, 1], anything below 0.5, you would like converted to 0?

maruware commented 2 years ago

@richhx Thanks for your reply.

For example, if binaryThreshold = 0.5 and the current range of the mask is [0, 1], anything below 0.5, you would like converted to 0?

Yes. I suppose that less than 0.5 is converted to 0, and greater than 0.5 is converted to 1.

richhx commented 2 years ago

Sorry for the slow reply again. I'm hesitant about adding this as a parameter because this should ideally be fixed in the model instead, so users don't need to specify an arbitrary threshold. The team is investigating what can be done from that point. In the meantime, I would recommend overriding the class and its function as you mentioned for those who would like this parameter/feature. I'm happy to provide example code to facilitate this if it helps.

maruware commented 2 years ago

this should ideally be fixed in the model instead

I agree it.

I would recommend overriding the class and its function as you mentioned for those who would like this parameter/feature. I'm happy to provide example code to facilitate this if it helps.

It seemed difficult to override for me (because BackgroundBlurProcessorProvided etc. are not exported), so example code would be helpful.

richhx commented 2 years ago

Ah good point on not being able to override it. Unfortunately, we don't want to make it public, since it can expose unnecessary parts of the blur processor that we will want to change. Since we don't want to cause breaking API problems for builders in the future, we prefer not to expose it at this time.

What can be done instead is fork the repository and apply the following patch, which does what I think you want. I did very minimal testing, but seems to do what is requested. You can technically also use something such as patch-package to do the patching (not a recommendation to use such a package, but just pointing it out).

diff --git a/demos/browser/app/meetingV2/meetingV2.ts b/demos/browser/app/meetingV2/meetingV2.ts
index b3512551..fc2ecfa1 100644
--- a/demos/browser/app/meetingV2/meetingV2.ts
+++ b/demos/browser/app/meetingV2/meetingV2.ts
@@ -3226,7 +3226,7 @@ export class DemoMeetingApp
       };

       const cpuUtilization: number = Number(videoFilter.match(/([0-9]{2})%/)[1]);
-      this.blurProcessor = await BackgroundBlurVideoFrameProcessor.create(this.getBackgroundBlurSpec(), { filterCPUUtilization: cpuUtilization });
+      this.blurProcessor = await BackgroundBlurVideoFrameProcessor.create(this.getBackgroundBlurSpec(), { filterCPUUtilization: cpuUtilization, binaryThreshold: 125 });
       this.blurProcessor.addObserver(this.blurObserver);
       return this.blurProcessor;
     }
diff --git a/src/backgroundfilter/BackgroundFilterOptions.ts b/src/backgroundfilter/BackgroundFilterOptions.ts
index 72c26d89..921c7f62 100644
--- a/src/backgroundfilter/BackgroundFilterOptions.ts
+++ b/src/backgroundfilter/BackgroundFilterOptions.ts
@@ -23,4 +23,13 @@ export default interface BackgroundFilterOptions {
    *
    */
   filterCPUUtilization?: number;
+
+  /**
+   * Binary threshold value to apply to the segmentation mask.
+   *
+   * Any alpha channel in a pixel that is less than this binary threshold value will be set to 0. The valid values for
+   * this field are 0-255. This is experimental and should generally not be used. Some clients find this useful to
+   * optimize the model.
+   */
+  binaryThreshold?: number;
 }
diff --git a/src/backgroundfilter/BackgroundFilterProcessor.ts b/src/backgroundfilter/BackgroundFilterProcessor.ts
index e5bc909b..fd7c9499 100644
--- a/src/backgroundfilter/BackgroundFilterProcessor.ts
+++ b/src/backgroundfilter/BackgroundFilterProcessor.ts
@@ -77,6 +77,7 @@ export default abstract class BackgroundFilterProcessor {
   protected frameCounter: BackgroundFilterFrameCounter;
   protected modelInitialized = false;
   private destroyed = false;
+  private binaryThreshold: number;

   protected static createWorkerPromise<T>(): {
     resolve: (value: T) => void;
@@ -144,6 +145,7 @@ export default abstract class BackgroundFilterProcessor {
     this.logger = options.logger;
     this.delegate = delegate;
     this.initCPUMonitor(options);
+    this.binaryThreshold = options.binaryThreshold;
   }

   initCPUMonitor(options: BackgroundFilterOptions): void {
@@ -369,6 +371,13 @@ export default abstract class BackgroundFilterProcessor {
         const maskPromise = this.mask$.whenNext();
         this.worker.postMessage({ msg: 'predict', payload: imageData }, [imageData.data.buffer]);
         mask = await maskPromise;
+        if (this.binaryThreshold) {
+          for (let i = 3; i < mask.data.length; i += 4) {
+            if (mask.data[i] < this.binaryThreshold) {
+              mask.data[i] = 0;
+            }
+          }
+        }
       }
       // It's possible that while waiting for the predict to complete the processor was destroyed.
       // adding a destroyed check here to ensure the implementation of drawImageWithMask does not throw

I suppose you can also override the object's prototyped method directly, but I highly recommend to not do that.

maruware commented 2 years ago

Thanks for the guidance about the fork and the patch ! It may cause increasing our maintenance costs. So we would like to carefully consider it.

richhx commented 2 years ago

In the meantime, we're continuing to investigate improving the segmentation mask so doing such a hack isn't needed.