GoogleChrome / chrome-extensions-samples

Chrome Extensions Samples
https://developer.chrome.com/docs/extensions
Apache License 2.0
15.37k stars 8.19k forks source link

Sample: Transfer a blob from a background context to a page #766

Open dotproto opened 1 year ago

dotproto commented 1 year ago

Goal of the demo

Demonstrate how an extension can create a blob in the extension's service worker and transfer it to a page's main world.

Suggested implementation

Sketch of my initial implementation plan

  1. Have a content script inject an iframe into a page
  2. Run a script in the iframe to post a message back to the service worker using navigator.serviceWorker.controller.postMessage()
  3. In the SW's message handler, generate a blob OR use structuredClone() to create a copy of a blob.
  4. Use event.source.postMessage(msg, [transferable]) to reply to the client and transfer the blob
  5. In the iframe's message handler, use window.parent.postMessage(msg, "<origin>", [transferable]) to transfer the data to the page

Related links

Notes

Initial findings suggest that it's not possible to transfer variables across origins. I'm tentatively thinking that to work around this, we can use URL.createObjectURL() in the iframe and directly reference the object URL somewhere that's easily visible to the end user (e.g. canvas, Audio/Video element, etc.).

guest271314 commented 1 year ago

Here you go https://github.com/guest271314/persistent-serviceworker/tree/main/chromium_extension_web_accessible_resources_iframe_message_event. You can transfer whatever you want. Bonus: Keeps MV3 ServiceWorker persistent.

guest271314 commented 1 year ago

Note, the last time I checked Blobs are not transferable.

guest271314 commented 1 year ago

Details of the message passing. We create a MessageChannel pair in the MV3 ServiceWorker, then transfer one of the MessagePorts created to the arbitrary Web page where action icon is clicked, to establish direct communication between SeriveWorker and the Web page where port is defined globally.

To send a Blob from the ServiceWorker to the Web page where action icon is clicked include on Line 111 of background.js

port1.postMessage(new Blob(['Message from MV3 ServiceWorker'], {type:'text/plain'})); 

To log messages in the ServiceWorker sent from port2 defined globally in the Web page where click action occurred add the following to Line 114 in background.js

  console.log(e.data);

To transfer substantial amounts of data we can transfer ArrayBuffer (and TypedArray buffer), WebAssembly.Memory (which can grow()), and on Chromium and Chrome Transferable Streams, e.g.,

const {readable,  writable} = new TransformStream();
port1.postMessage(readable, [readable]);
const writer = writable.getWriter();
await writer.write(new Uint8Array([1,2,3,...]));
// close the stream
await writer.close();

on Web page

port.onmessage = async(e) => {
  if (e.data instanceof ReadableStream) {
    await e.data.pipeThrough(...).pipeTo(...);
  }
}

e.g., from an iframe appended to an arbitrary Web page https://github.com/guest271314/captureSystemAudio/blob/master/native_messaging/capture_system_audio/transferableStream.js

onload = () => {
  const { readable, writable } = new TransformStream({
    transform(value, controller) {
      controller.enqueue(value);
    },
    flush() {
      console.log('Flush.');
    },
  });
  const writer = writable.getWriter();
  const id = 'capture_system_audio';
  const port = chrome.runtime.connectNative(id);
  port.name = id;
  async function handleMessage(value, port) {
    // value could be valid JSON, e.g, "[0.123, 0.456,...]"
    if (!Array.isArray(value)) {
      value = JSON.parse(value);
    }
    try {
      await writer.ready;
      // pass Uint8Array to write()
      await writer.write(new Uint8Array(value));
    } catch (e) {
      console.error(e.message);
    }
    return true;
  }
  port.onDisconnect.addListener(async (e) => {
    console.log(e.message);
    await chrome.storage.local.clear();
  });
  port.onMessage.addListener(handleMessage);
  onmessage = async (e) => {
    const { type, message } = e.data;
    if (type === 'start') {
      port.postMessage(message);
      parent.postMessage(readable, name, [readable]);
    }
    if (type === 'stop') {
      try {
        port.disconnect(id);
        console.log(writer.desiredSize, message);
        while (writer.desiredSize <= 1) {    
          await scheduler.postTask(() => {});
          if (writer.desiredSize === 1) {
            console.log('writable', writer.desiredSize);   
            break;
          }
        }
        await writer.close();
        await writer.closed;
        console.log(writer.desiredSize);
        parent.postMessage(0, name);
        onmessage = null;
        await chrome.storage.local.clear();
      } catch (err) {
        console.error(err.message);
      }
    }
  };
  parent.postMessage(1, name);
};

As to quantity or volume streaming JSON (string) or Array with onMessage handler and creating a Uint8Array and utilizing Streams API, I have recorded well over 1 hour of raw PCM (10.5 MB per 1 minute) using the above approach.

There are other ways to communicate between ServiceWorker and a Web page.

However, the above being possible, it would be less cumbersome if the MV3 ServiceWorker could set arbitrary Web pages as a WindowClient - then we could communicate directly, and utilize onfetch (and respondWith() and Response()) without a iframe making requests and multiple intermediary transfering agents. Something like Transferable Streams directly to the ServiceWorker from an arbitrary Web page without necessarily needing onmessage and postMessage(), e.g., https://bugs.chromium.org/p/chromium/issues/detail?id=1214621.

tophf commented 1 year ago

Goal of the demo transfer it to a page's main world

It's the wrong goal in the context of the linked issue. I guess its title was initially confusing as it mentioned "web page" instead of "web page's content script", so I've changed it to reflect the contents of the issue, which is about the content script. It's a more difficult task to accomplish securely, but still possible via an iframe inside a closed ShadowDOM with a random token + MessagePort as described in the comment linked in the issue. One change to the scheme would be to send the MessageChannel port from SW to the content script via the iframe, then use this port to transfer ~blobs~ transferable binary data directly and instantly by specifying the second parameter of postMessage.


The following has been proved wrong, see the clarifying comment.

Note that to transfer a huge blob instantly, which is what the related issue is all about, we need the second parameter:~

port1.postMessage(blob, [await blob.arrayBuffer()]);

~It should be mentioned explicitly that it destroys the blob in the source (in SW).

guest271314 commented 1 year ago

One change to the scheme would be to send the MessageChannel port from SW to the content script via the iframe

That is what the code I posted does.

Note that to transfer a huge blob instantly, which is what the related issue is all about, we need the second parameter:

port1.postMessage(blob, [await blob.arrayBuffer()]);

As far as I know Blobs are not transferable and that code does not transfer a Blob.

tophf commented 1 year ago

Blob is just a wrapper on the same data exposed via arrayBuffer accessor or other getters on the blob, so this code does transfer the actual data instantly.

guest271314 commented 1 year ago

Kindly cite where you read Blobs are transferable.

tophf commented 1 year ago

Not the Blob, the arrayBuffer view of the same data contained in the Blob.

guest271314 commented 1 year ago

That is not transferring a Blob. Just deal with ArrayBuffer and TypedArray if you want to transfer data.

tophf commented 1 year ago

A Blob will be received by the listener, so it can be said that the Blob is effectively transferred.

guest271314 commented 1 year ago

No it cannot be said, because Blobs are not transferable objects.

tophf commented 1 year ago

Blob is not a Transferable but it's effectively transferable. Anyway, it's semantics.

guest271314 commented 1 year ago

Note that Blob is not listed here https://developer.mozilla.org/en-US/docs/Glossary/Transferable_objects#supported_objects.

tophf commented 1 year ago

I know.

guest271314 commented 1 year ago

Then as I said you cannot transfer a Blob.

What you can do is pass a FileSystemHandle to the ServiceWorker from the main Web page and write data to the FileSystemFileHandle then transfer the object back to the main frame then call .getFile().

guest271314 commented 1 year ago

Setting the type of data and object that can be and cannot be transfered aside for the moment, the code I posted achieves the requirement.

tophf commented 1 year ago

Using the code I posted (with the second parameter using blob.arrayBuffer) will transfer a Blob instantly, it's trivial to verify.

tophf commented 1 year ago

FWIW, if you're interested what happens under the hood, here's my guess: the blob's actual data being a Transferable (blob.arrayBuffer) is sent instantly, while the Blob itself is just a wrapper over that data, which is structuredClone'able, so it's reconstructed in the receiver using the instantly transferred data.

guest271314 commented 1 year ago

Where did you learn that calling Blob.arrayBuffer() at 2d parameter of postMessage() will

transfer a Blob instantly

?

In what specification is that written? Kindly cite your sources.

What happens when you post the Blob without 2d parameter to postMessage()?

tophf commented 1 year ago
  1. It's how Blob and other binary related wrappers (DataView, typed arrays, buffer) work. You can investigate it yourself.
  2. The blob will be copied via internal structuredClone.
guest271314 commented 1 year ago

You haven't answered the question I asked re source of your claim?

tophf commented 1 year ago

It's in the specifications somewhere, which I investigated when I looked for a way to transfer blobs instantly. I found it, confirmed it worked. I don't have the links in browser history anymore, so if you're interested you can investigate and verify it yourself.

guest271314 commented 1 year ago

No, it is not in the specifications.

And it doesn't happen https://plnkr.co/edit/ZyutxttrhgerqZpk?open=lib%2Fscript.js.

guest271314 commented 1 year ago

Anyway, I think the trasferable Blob claim can be set aside.

The Web site removing the iframe of a Chrome extension is more interesting and far more difficult to prove by reproduction.

tophf commented 1 year ago

Your demo prints the blob in console, just as it should.

guest271314 commented 1 year ago

The Blob is not transfered whatsoever. Note both the Blob in the Worker and the Blob received on main page have size 3.

That means

[await blob.arrayBuffer()]

is completely superfluous.

tophf commented 1 year ago

Maybe there's no need for arrayBuffer, but try making a 1GB Blob and you will see it's transferred instantly, which is the only thing I care about in this context.

guest271314 commented 1 year ago

You are using incorrect terminology The Blob is not transfered at all. You will have 1GB Blob in DedicatedWorker or ServiceWorker contexts and in main thread.

tophf commented 1 year ago

Try sending ['a'.repeat(100e6)] and new Blob(['a'.repeat(500e6)]). The string will take 1sec, the blob will be instantly sent.

guest271314 commented 1 year ago

As long as you have ceased and desisted using the term transferred we are making progress.

I won't go in to the "instantly" claim as that is essentially the same as transferred.

Blob construction is expensie, and not instant.

If you want instant you should use SharedArrayBuffer.

guest271314 commented 1 year ago

We have established that we can post Blobs from the ServiceWorker to the arbitrary Web page.

tophf commented 1 year ago

Okay, I see now where I was wrong. Apparently the speed difference confused me. The Blob is sent instantly even without being actually transferred, whereas a string takes a lot of time probably due to the need to intern it in the receiver.

guest271314 commented 1 year ago

No. That is not what is occurring.

To learn about Blobs in Chrome I suggest reading this answer https://stackoverflow.com/a/56419176 at Where is Blob binary data stored?.

tophf commented 1 year ago

It means that to achieve the true instant transfer, which is the goal of the linked issue, the demo should use arrayBuffer:

const buf = await blob.arrayBuffer();
port.postMessage([buf, blob.type], [buf]);

and the receiver's listener should reconstruct the Blob from this data. I see this was correctly done in the original post linked in the comment linked in the issue, which I incorrectly reproduced from my faulty memory, sorry.

guest271314 commented 1 year ago

Alright. That example is basically the code I posted here https://github.com/guest271314/persistent-serviceworker/tree/main/chromium_extension_web_accessible_resources_iframe_message_event which I don;t think you have tested yet. That is possible. You are just posting an Array to the main page.

guest271314 commented 1 year ago

As I commented above, File System Access API FileSystemFileHandle is transferable. Which means you can pass a file handle to the ServiceWorker, write data using FileSystemWritableFileStream and if necessary post (transfer) that same FileSystemFileHandle back to the main browsing context - or write the file directly to users' file system.

I am not sure why you focus is on a Blob?

Yet, you can follow the instructions I posted above https://github.com/GoogleChrome/chrome-extensions-samples/issues/766#issuecomment-1294267869 to post a Blob from MV3 ServiceWorker to arbitrary Web page using MessageChannel.

tophf commented 1 year ago

A Blob is just an example of a popular binary type, nothing special. It might be useful for this demo to show how to send (and optionally transfer) a few other binary types as well, including streams.

guest271314 commented 1 year ago

I did show that. I transfer a ReadableStream to the arbitrary Web page with postMessage() and read the data using Streams API.

guest271314 commented 1 year ago

The code in this comment https://github.com/GoogleChrome/chrome-extensions-samples/issues/766#issuecomment-1294267869 streams raw PCM at ~10MB per minute. Note also that depending on the Native Messaging host used the source data processed in onMessage handler can also be JSON in the form of "[0.123, 0.456, ...]" which is then converted to a JavaScript object then passed to Uint8Array where that data is streamed to the arbitrary Web page.

guest271314 commented 1 year ago

@tophf What are you actually trying to do?

tophf commented 1 year ago

Nothing specific personally. The linked issue simply summarizes the need previously expressed by other extension authors who aren't present in this group (WCEG).

guest271314 commented 1 year ago

W3C banned me, so I am unable to post on W3C or WICG repositories.

As demonstrated in the linked repository code it is possible to use postMessage() to send data from an MV3 ServiceWorker to an arbitrary Web page, and vice versa.

guest271314 commented 1 year ago

Re the unverified conjecture that a We page could run code constantly to check to determine if a Web page has loaded an iframe with src set to chrome-extension: protocol and then remove the iframe, to avoid that concern that so far has not been reproduced, simply open the HTML page listed in "web_accessible_resources" as a top-level window in a Tab, then that hypothetical concern is mitigated.

guest271314 commented 1 year ago

@tophf I will reiterate here that the approach, while possible, is not as ergonomic as it could be.

Ideally we can directly assign an arbitrary Web page as a WindowClient https://developer.mozilla.org/en-US/docs/Web/API/WindowClient, then we will have direct communication without an iframe.

Nonetheless the requirement is possible right now.

I have streamed over an hour of raw PCM through the embedded iframe multiple times using the above approach. That is well over 650MB.

tophf commented 1 year ago

To be more precise, a site can't read or see an iframe inside closed ShadowDOM as it won't be exposed as window[0] or the identical frames[0] (at least in Chrome; FF incorrectly exposes it). FWIW, even without using ShadowDOM the site won't see the exact URL of the iframe if we create a src-less iframe, then set its frameElem.contentWindow.location.href to web_accessible_resources URL, and the web site will never be able to read the actual URL, it will only be able to infer that the iframe is cross-origin. Still, it's physically possible for a site to delete all DOM elements that it didn't create as it's pretty straightforward to implement.

Note that the linked issue is about secure messaging to a content script specifically, not a web page, so any demo that targets the web page's MAIN world as the receiver is not really related to that issue.

to avoid that concern that so far has not been reproduced, simply open the HTML page listed in "web_accessible_resources" as a top-level window in a Tab, then that hypothetical concern is mitigated.

This will be blocked if the the user has blocked popups for this site. It will also occupy the tab strip with something the user doesn't need.

guest271314 commented 1 year ago

Still, it's physically possible for a site to delete all DOM elements that it didn't create as it's pretty straightforward to implement.

Kindly cite an example of that occurring in the wild.

Note that the linked issue is about secure messaging to a content script specifically, not a web page, so any demo that deals with the web page's MAIN world is not really related to that issue.

A content script is just code injected into globalThis. chrome.scripting.executeScript() pulls back the veil on that.

There is no such thing as "secure" in the domain of signal communications.

tophf commented 1 year ago

It doesn't matter how many times you ask me to cite a URL, I won't do it. Regardless of our personal experience, it remains physically possible. There are extensions that can't accept that risk.

A content script runs in the isolated world which cannot be accessed and intercepted by the web page's MAIN world (unless there's a bug in the browser), which is what makes such communication secure. Since we're not writing a scientific article this term is apt.

guest271314 commented 1 year ago

I think you have a misconception of what "secure" means.

As of last century certain entities were capturing and analyzing 20TB of Internet data in real-time https://agoodamerican.org/. There is no way for you to verify your data has not been intercepted.

I don't think it is wise to throw around terms like "secure" without understanding the physical topography of the medium.

There are scientific terms as we are discussing technical writing.

That is, if the requirement is not possible, then somebody has to write out what they intend to implement, using technical writing to cite prior art, goals, non-goals, etc.

Defining "secure" is impossible, as it doesn't exist.

tophf commented 1 year ago

Any term is fine by me as long as it means "cannot be accessed and intercepted by the web page's MAIN world unless there's a bug in the browser".

guest271314 commented 1 year ago

It doesn't matter how many times you ask me to cite a URL, I won't do it.

You can't.

The claim is pure conjecture.

Any term is fine by me as long as it means "cannot be accessed and intercepted by the web page's MAIN world unless there's a bug in the browser".

You can't guarantee that.

Notably that counters your claim that Web sites will remove iframes from their site. How would they do that when the I iframe is only appended on your machine, using your files on your machine?