Tampermonkey / tampermonkey

Tampermonkey is the most popular userscript manager, with over 10 million users. It's available for Chrome, Microsoft Edge, Safari, Opera Next, and Firefox.
GNU General Public License v3.0
4.18k stars 417 forks source link

Make GM_xhr support responseType "stream" #1278

Closed AlttiRi closed 2 years ago

AlttiRi commented 3 years ago

Implement GM.fetch/GM_Fetch

I have already talked about GM.xmlhttpRequest and fetch here: https://github.com/Tampermonkey/tampermonkey/issues/1050 But it makes sense to write this separate issue, since there was talking about fetch: true option of GM.xmlhttpRequest, but this issue is about a new function.

Tampermonkey currently has fetch: true as a Beta feature, but that looks no very useful feature since it uses fetch just only in the background script, the other moments of GM.xmlhttpRequest work in the usual way.

I suggest to add a new GM.fetch function that will be written from zero, has the same API as fetch and use the benefist of streaming, which impossible to do with XMLHttpRequest.


Compatible with Fetch API

Like GM.xmlhttpRequest it will allow to bypass CORS limitation, but API of GM.fetch should be compatible with the common Fetch API.

One of annoying things of GM.xmlhttpRequest when you use it the first times is that while it names like XMLHttpRequest, but API is not the same. API of GM.fetch should be compatible with the common Fetch API. It will make writing of script to easier since you do not use some other API, and you can reuse the existing code that was written for fetch.

So, using any existing code that written for fetch should work with GM.fetch transparently. For example wrapping ReadableStream (response) for progress indication:

The example ```js let response = await fetch(url); response = loadingProxyResponse(response, onLoadingProgressLog); // ... // just logs the loading progress function onLoadingProgressLog(receivedLength, contentLength) { const percentage = receivedLength / contentLength; const text = (percentage * 100 + "").substring(0, 4) + " %"; console.log(text); } function loadingProxyResponse(response, onProgress) { const contentLength = parseInt(response.headers.get("Content-Length")); let receivedLength = 0; const reader = response.body.getReader(); const readableStream = new ReadableStream({ async start(controller) { while (true) { const {done, value} = await reader.read(); // value is Uint8Array if (done) { break; } receivedLength += value.length; try { onProgress(receivedLength, contentLength); } catch (e) { console.error("onProgress error:", e); } controller.enqueue(value); } controller.close(); reader.releaseLock(); }, cancel() { reader.cancel(); } }); return new Response(readableStream, {headers: response.headers}); } ```

Streaming

Okay, compatibility with Fetch API is good, but it not the main thing that I want. You can write some wrapper for GM.xmlhttpRequest that will look like fetch even now. But with just a wrapping you can't do the main feature of fetch — it's streaming.

The streaming is the core difference between XMLHttpRequest and fetch. When you download some data with XMLHttpRequest the entire data stores in the memory until the request is finished and data is handled. In case of a userscript manager extension the consuming of memory is double. Since the data stores both in content/web script and in the background script. But when you use fetch you can consume data from the stream as soon as the data received and if you do not need the data anymore the garbage collector can destroy them.

In case of TamperMonkey you do not need to store the downloaded data in the background script's context. You just pass them to context script as they are received and GC collect them. So it already notable reduce memory consuming — two times. Also with streaming you can use use the such approach: https://github.com/jimmywarting/StreamSaver.js In this case there no any washing of the memory for storing data at all when you download files. Memory consuming is O(1) for any file of any size.

So this will make userscripts more memory optimised. Notable optimised.


Also I think it would more easier to implement (and support it later) a new function from zero, than trying to add the fetch's functional to the existing GM.xmlhttpRequest as a mode (fetch: true) which you currently do. The same thing would actual for other userscript managers.

AlttiRi commented 3 years ago

By the way, this guy tried to do GM_fetch over GM_xmlhttpRequest: https://github.com/mitchellmebane/GM_fetch

He uses window.fetch polyfill as a base.

But as I said earlier it's impossible to implement streaming for fetch with XMLHttpRequest. So, it's just a wrapper for more convenient using like Axios that has no benefits of the streaming.


Here is my simplified wrapper:

The code ```js // Using: const response = await fetch(url); const {status, statusText} = response; const lastModifiedSeconds = response.headers.get("last-modified"); const blob = await response.blob(); // The simplified `fetch` — wrapper for `GM.xmlHttpReques` async function fetch(url) { return new Promise((resolve, reject) => { const blobPromise = new Promise((resolve, reject) => { GM.xmlHttpRequest({ method: "get", url, responseType: "blob", onload: async (response) => resolve(response.response), onerror: reject, onreadystatechange: onHeadersReceived }); }); blobPromise.catch(reject); function onHeadersReceived(response) { const { readyState, responseHeaders, status, statusText } = response; if (readyState === 2) { // HEADERS_RECEIVED const headers = parseHeaders(responseHeaders); resolve({ headers, status, statusText, arrayBuffer: () => blobPromise.then(blob => blob.arrayBuffer()), blob: () => blobPromise, json: () => blobPromise.then(async (blob) => JSON.parse(await blob.text())), text: () => blobPromise.then(blob => blob.text()), }); } } }); } function parseHeaders(headersString) { class Headers { get(key) { return this[key.toLowerCase()]; } } const headers = new Headers(); for (const line of headersString.split("\n")) { const [key, ...valueParts] = line.split(":"); // last-modified: Fri, 21 May 2021 14:46:56 GMT if (key) { headers[key.trim().toLowerCase()] = valueParts.join(":").trim(); } } return headers; } ```

BTW, my special version of fetch resolves correctly on HEADERS_RECEIVED event, and then response resolves on load event, while fetch.js polyfill's fetch resolves only on load event, what's not OK.

AlttiRi commented 3 years ago

TL;DR

AlttiRi commented 3 years ago

The very simplified demo that shows how to transmit data (in stream way) from the background script to the content script and create a Response object:

content script:

let resolve;
let promise;
function updatePromise() {
    promise = new Promise(_resolve => {
        resolve = _resolve;
    });
}
updatePromise();

const port = chrome.runtime.connect({name: "demo-fetch"});
port.onMessage.addListener(async function({done, value, i}) {
    // console.log({done, value, i});
    const ab = await fetch(value).then(r => r.arrayBuffer());
    const u8a = new Uint8Array(ab);

    // console.log(i, u8a);
    resolve({done, value: u8a, i});
    updatePromise();
});

const rs = new ReadableStream({
    async start(controller) {
        while (true) {
            const {done, value} = await promise;
            if (done) {
                break;
            }
            controller.enqueue(value);
        }
        controller.close();
    }
});

new Response(rs)
    .blob()
    .then(blob => {
        // console.log(blob);
        const a = document.createElement("a");
        a.href = URL.createObjectURL(blob);
        a.download = "ShockedSecondaryFiddlercrab.mp4";
        a.click();
        setTimeout(() => URL.revokeObjectURL(a.href), 1000);
    });

background script:

const url = "https://giant.gfycat.com/ShockedSecondaryFiddlercrab.mp4"; // 32 MB
// const url = "https://giant.gfycat.com/ConfusedRecentGuppy.mp4"; // 104 MB

chrome.runtime.onConnect.addListener(async function(port) {
    console.log(port);
    if (port.name === "demo-fetch") {
        let i = 0;
        const response = await fetch(url , cache: "force-cache"});
        const reader = response.body.getReader();
        while (true) {
            const {done, value} = await reader.read(); // value is Uint8Array
            const blob = new Blob([value]);
            const url = URL.createObjectURL(blob);
            setTimeout(() => URL.revokeObjectURL(url), 1000);
            port.postMessage({
                done,
                value: url,
                i: i++
            });
            if (done) {
                break;
            }
        }
    }
});
derjanb commented 3 years ago

Thanks for all the code and suggestions. I've implemented streaming support in 4.14.6142, but a litter different than suggested.

GM.xmlhttpRequest [...] names like XMLHttpRequest, but API is not the same.

But this is also a benefit. There already is a way to make requests and I don't like to introduce another one. That's why GM_xhr now gets responseType "stream" support. The response object then is a ReadableStream object. You can access it in onload, but it makes more sense to work with it inside a onloadstarted event listener to actually really use the streaming capabilities.

You can check streaming support this way:

console.log('streaming ' + (GM_xmlhttpRequest.RESPONSE_TYPE_STREAM === 'stream' ? 'supported' : 'not supported');

Example:

GM_xmlhttpRequest({
    method:   'GET',
    url:      'http://ipv4.download.thinkbroadband.com/100MB.zip?t=' + Date.now(),
    responseType: 'stream',
    onloadstart: async function(r) {
        if (r.readyState == 4 && r.status == 200) {
            const reader = r.response.getReader();
            while (true) {
                const { done, value} = await reader.read(); // value is Uint8Array
                if (value) {
                    console.log(value.length, 'received')
                }
                if (done) break;
            }
            console.log('done');
        }
    }
});
AlttiRi commented 3 years ago

I just have checked it now.

Bugs:

  1. GM.xmlHttpRequest.RESPONSE_TYPE_STREAM is undefined.
  2. onloadstart event does not fires on 404 and similar codes.
  3. onreadystatechange, onload for 404 code (and similar ones) have zero response (zero bytes in ReadableStream).
  4. onloadstart's response has readyState === 4, but it's fake, it should have 2 (HEADERS_RECEIVED). Also it looks very strange if you did it specially. It misleads people about when the event fires.
  5. onreadystatechange triggers only on readyState === 4 when the data was received fully in the background script. onreadystatechange should at least also triggers on HEADERS_RECEIVED event (with the response (ReadableStream) property).

ReadableStream should contain the response data for any status code, so if (r.readyState == 4 && r.status == 200) in your example is unnecessary.

AlttiRi commented 3 years ago

How to pass fetch's init object?

{
    headers: new Headers(),
    cache: "only-if-cached"
}

Does it support signal, body and headers as Headers?

AlttiRi commented 3 years ago

I still think that implementing of GM.fetch (compatible with native fetch) is a better idea that overengineering of GM.xmlHttpRequest that makes it difficult maintaining for you (and other a userscript manager extension developers), and using for script makers.

Also it would be better for popularisation of this (stream) approach. And for ease of use. Significantly. It's enough to hear once about GM.fetch and people will start to use it (if it will be implemented), because they know how to use it and it's a much more convenient thing.


GM.xmlHttpRequest is probably the most inconvenient API for HTTP requests.

There is XMLHttpRequest, but no one use it, people uses either a wrapper for it: jQuery.ajax or axios, or native fetch. GM.xmlHttpRequest has "XMLHttpRequest-like API", but its API is not compatible with XMLHttpRequest, that makes it even worst. Adding of new parameters only complicates it.


@tophf, @gera2ld that do you think about my suggestion about GM.fetch and about fetch: true/responseType: "stream" of Tampermonkey?

tophf commented 3 years ago

I share derjanb's preference for GM_xmlhttpRequest in https://github.com/Tampermonkey/tampermonkey/issues/1278#issuecomment-884363078.

As for Violentmonkey, it already uses chunked transfers via Blob internally to speed up such huge downloads significantly so I agree it can wrap them as a stream but further discussion should be in Violentmonkey's repository, not here.

Regarding the examples above that use Blob+createObjectURL, note that in our tests there were cases where it didn't work: 1) in Firefox the content scripts can't fetch extension's own blob on sites with a strict CSP and 2) in incognito mode both in Chrome and Firefox. In these cases Violentmonkey uses a much slower serialization to a JSON-compatible value in the background script, then deserializes it in the content script.

AlttiRi commented 3 years ago

Wait, it's not possible to pass the init object? The most part of the properties of the init object just a string.

I only found a mention of nocache: true, that is (as I understand) {cache: "no-store"}. https://github.com/Tampermonkey/tampermonkey/issues/1003#issuecomment-660072659

That's all the mode fetch: true was added for? I expected something like fetchInit property in additional to fetch: true.

derjanb commented 3 years ago

That's all the mode fetch: true was added for?

Initially fetch was added to support anonymous GM_xhr at Chrome while FF already had mozAnon support. The existing fetch: true only enforces the request type used in background. Other options might change this as well (e.g. details.anyonymous and now details.responseType: "stream").

I expected something like fetchInit property in additional to fetch: true. How to pass fetch's init object?

Users should ideally not care about the way the data is retrieved at the background, but simply use GM_xhr's features.

Does it support signal, body and headers as Headers?

No. You can abort the request via the returned abort function -> no need for signal. Also details.data is used instead of body and details.headers is a plain key value object.

All other issues (1. - 5.) should be fixed at 4.14.6143 (in review|crx)

AlttiRi commented 3 years ago

I did not test it yet.


About fetch init object:

Okay, but there is fetch: true that indicates the way how the data is retrieved in the background.

So I see nothing criminal to modify this field to the simplified version of fetch init object.

It will be compatible with the existed code because any object is true. And it's very simple to implement. Just use this object as a base of Object.assign in the background script when you pass your fetch init object.

fetch(url, props);

->

if (typeof fetchInit === "object" && fetchInit !== null) {
  props = Object.assign(fetchInit, props); // fetchInit is passed by a user 
}
fetch(url, props);

Just allow to be fetch property of GM.xmlhttpRequest to be

{
  referrer: "", // optionally, since the extension allow to set headers in more advanced way
  referrerPolicy: "no-referrer-when-downgrade",
  mode: "cors",
  credentials: "include",
  cache: "force-cache",
  redirect: "follow",
  integrity: "sha256-BpfBw7ivV8q2jLiT13fxDYAe2tJllusRSZ273h2nFSE=",
  keepalive: true,
}

object. All these properties are string (and boolean the last one).

AlttiRi commented 3 years ago

It is the last thing that I need for writing of a normal wrapper over GM.xmlhttpRequest to emulate GM.fetch API.

derjanb commented 3 years ago

Just allow to be fetch property of GM.xmlhttpRequest to be [...]

This is not going to happen. Sorry. First, this approach would add a lot of inconsistencies and complexity. What should GM_xhr do if fetch init wants to use the cache, but nocache is set was well? And second the safari app extension for example has no background fetch support. All GM_xhr functionality is implemented using native Swift code.

So if you really miss some fetch functionality that can't be workarounded or implemented client-wise, then please report the use case as a new issue and I may add this functionality to GM_xhr, but probably in a way that works regardless of the background request API used, if possible.

AlttiRi commented 2 years ago

Bugs:

~1.~ onloadstart now fires on readyState === 1, ~but it should fire on readyState === 2 (HEADERS_RECEIVED), in order to have access to headers, status and statusText.~

  1. response in onreadystatechange has missed status (it's 0) (and possibly statusText) on readyState === 2 ~as well as in onloadstart event~. Because of this I can't construct the correct Response(rs, {headers, status, statusText}). status will always be 200 — it's the default value for Response.

Here is the code:

// ==UserScript==
// @name         Mode stream fetch
// @namespace    http://tampermonkey.net/
// @description  test
// @version      0.2
// @match        https://example.com/*
// @grant        GM_xmlhttpRequest
// @connect      ipv4.download.thinkbroadband.com
// @connect      giant.gfycat.com
// @connect      example.com
// ==/UserScript==

let url = "http://ipv4.download.thinkbroadband.com/10MB.zip?t=" + Date.now(); // 408
    url = "https://giant.gfycat.com/ShockedSecondaryFiddlercrab.mp4";         // 200   // 32 MB
 // url = "https://example.com/xxx";                                          // 404

// -----------------
// Using:
(async function() {
    const response = await fetch(url);

    console.log(response);
    const {status, statusText} = response;
    const lastModified = response.headers.get("last-modified");

    // BUG: status is ALWAYS 200 (default)
    console.log({status, statusText, lastModified});

    const blob = await response.blob();
    console.log(blob);
    downloadBlob(blob, "x.mp4", url);
})();
// -----------------

async function fetch(url, fetchInit = {}) {
    const defaultFetchInit = {method: "get"};
    const {headers, method} = {...defaultFetchInit, ...fetchInit};
    const isStreamSupported = GM_xmlhttpRequest?.RESPONSE_TYPE_STREAM;
    const HEADERS_RECEIVED = 2;
    if (!isStreamSupported) {
        return new Promise((resolve, _reject) => {
            const blobPromise = new Promise((resolve, reject) => {
                GM_xmlhttpRequest({
                    url,
                    method,
                    headers,
                    responseType: "blob",
                    onload: (response) => resolve(response.response),
                    onerror: reject,
                    onreadystatechange: onHeadersReceived
                });
            });
            blobPromise.catch(_reject);
            function onHeadersReceived(gmResponse) {
                const {
                    readyState, responseHeaders, status, statusText
                } = gmResponse;
                if (readyState === HEADERS_RECEIVED) {
                    const headers = parseHeaders(responseHeaders);
                    resolve({
                        headers,
                        status,
                        statusText,
                        arrayBuffer: () => blobPromise.then(blob => blob.arrayBuffer()),
                        blob: () => blobPromise,
                        json: () => blobPromise.then(blob => blob.text()).then(text => JSON.parse(text)),
                        text: () => blobPromise.then(blob => blob.text()),
                    });
                }
            }
        });
    } else {
        return new Promise((resolve, _reject) => {
            const responsePromise = new Promise((resolve, reject) => {
                void GM_xmlhttpRequest({
                    url,
                    method,
                    headers,
                    responseType: "stream",
                    onerror: reject,
                    onreadystatechange: onHeadersReceived,
                    onloadstart: (gmResponse) => console.log("[onloadstart]", gmResponse) // debug
                });
            });
            responsePromise.catch(_reject);
            function onHeadersReceived(gmResponse) {
                console.log("[onreadystatechange]", gmResponse); // debug
                const {
                    readyState, responseHeaders, status, statusText, response: readableStream
                } = gmResponse;
                if (readyState === HEADERS_RECEIVED) {
                    const headers = parseHeaders(responseHeaders);
                    let newResp;
                    if (status === 0) {
                        console.warn("status is 0!", {status, statusText});
                        newResp = new Response(readableStream, {headers, /*status, statusText*/});
                    } else {
                        newResp = new Response(readableStream, {headers, status, statusText});
                    }
                    resolve(newResp);
                }
            }
        });
    }
}

function downloadBlob(blob, name, url = "") {
    const anchor = document.createElement("a");
    anchor.setAttribute("download", name || "");
    const blobUrl = URL.createObjectURL(blob);
    anchor.href = blobUrl + "#" + url;
    anchor.click();
    setTimeout(() => URL.revokeObjectURL(blobUrl), 0);
}

function parseHeaders(headersString) {
    class Headers {
        get(key) {
            return this[key.toLowerCase()];
        }
    }
    const headers = new Headers();
    for (const line of headersString.trim().split("\n")) {
        const [key, ...valueParts] = line.split(":"); // last-modified: Fri, 21 May 2021 14:46:56 GMT
        if (key) {
            headers[key.trim().toLowerCase()] = valueParts.join(":").trim();
        }
    }
    return headers;
}
derjanb commented 2 years ago

onloadstart now fires on readyState === 1

This is when the loadstart event is fired.

response in onreadystatechange has missed status (it's 0) (and possibly statusText) on readyState === 2 as well as in onloadstart event

Good catch. Will be fixed at the next BETA version.

AlttiRi commented 2 years ago

The memory consuming is less multiple times (x2+) with the streaming!


I have checked the code above on a Virtual Machine (Windows) with 2 GB of memory (OS uses 1.2 GB of committed memory, 1.5 GB with the browser) with the disabled Virtual memory file (Paging file): image

Yes, as I expected, the code with streaming is much memory optimised. Notable!

With the default approach I can download only 100 MB file, while with ReadableStream (responseType: "stream") I can download 300 MB file.

On VM with 2.5 GB of memory with XHR I can download 200 MB file, but not 300 MB. To download 300 MB file (without streaming) I need to set 2.8 GB of the memory! (In other case the browser closes/nothing happens.)

As I said above the theoretical memory optimisation is double, and practical one is even more.


Note: I restarted the browser after each downloading. So it was tested properly.

AlttiRi commented 2 years ago

Double reducing of write overhead

...for the temporal storing of blobs by a browser.


Also in the main OS (where there is enough of the memory), writing of the blob data to SSD (in blob_storage folder) is twice less!

There is no blob flushing to blob_storage folder on the data exchange between the background script and the content script with streaming.


To download a 200 MB file it requires to write ~400 MB with XHR, and only ~200 MB with fetch in blob_storage folder.

In both cases additional blob is created when I download it in downloadBlob function. (200 MB)


Data written to SSD (exchange + downloading):

TwoLeaves commented 5 months ago

As for Violentmonkey, it already uses chunked transfers via Blob internally to speed up such huge downloads significantly so I agree it can wrap them as a stream but further discussion should be in Violentmonkey's repository, not here.

Regarding the examples above that use Blob+createObjectURL, note that in our tests there were cases where it didn't work: 1) in Firefox the content scripts can't fetch extension's own blob on sites with a strict CSP and 2) in incognito mode both in Chrome and Firefox. In these cases Violentmonkey uses a much slower serialization to a JSON-compatible value in the background script, then deserializes it in the content script.

Do both Violentmonkey and Tampermonkey still use this serialistion method in incognito mode?