Closed AlttiRi closed 2 years ago
By the way, this guy tried to do GM_fetch
over GM_xmlhttpRequest
:
https://github.com/mitchellmebane/GM_fetch
He uses window.fetch
polyfill as a base.
But as I said earlier it's impossible to implement streaming for fetch
with XMLHttpRequest
. So, it's just a wrapper for more convenient using like Axios that has no benefits of the streaming.
Here is my simplified wrapper:
BTW, my special version of fetch
resolves correctly on HEADERS_RECEIVED
event, and then response
resolves on load
event, while fetch.js polyfill's fetch
resolves only on load
event, what's not OK.
TL;DR
GM.fetch
will reduce memory consuming twice (theoretically), and practically (based on my tests) even more;GM.fetch
will have well known and convenient API. Compatibility with an existing code.The very simplified demo that shows how to transmit data (in stream way) from the background script to the content script and create a Response
object:
content script:
let resolve;
let promise;
function updatePromise() {
promise = new Promise(_resolve => {
resolve = _resolve;
});
}
updatePromise();
const port = chrome.runtime.connect({name: "demo-fetch"});
port.onMessage.addListener(async function({done, value, i}) {
// console.log({done, value, i});
const ab = await fetch(value).then(r => r.arrayBuffer());
const u8a = new Uint8Array(ab);
// console.log(i, u8a);
resolve({done, value: u8a, i});
updatePromise();
});
const rs = new ReadableStream({
async start(controller) {
while (true) {
const {done, value} = await promise;
if (done) {
break;
}
controller.enqueue(value);
}
controller.close();
}
});
new Response(rs)
.blob()
.then(blob => {
// console.log(blob);
const a = document.createElement("a");
a.href = URL.createObjectURL(blob);
a.download = "ShockedSecondaryFiddlercrab.mp4";
a.click();
setTimeout(() => URL.revokeObjectURL(a.href), 1000);
});
background script:
const url = "https://giant.gfycat.com/ShockedSecondaryFiddlercrab.mp4"; // 32 MB
// const url = "https://giant.gfycat.com/ConfusedRecentGuppy.mp4"; // 104 MB
chrome.runtime.onConnect.addListener(async function(port) {
console.log(port);
if (port.name === "demo-fetch") {
let i = 0;
const response = await fetch(url , cache: "force-cache"});
const reader = response.body.getReader();
while (true) {
const {done, value} = await reader.read(); // value is Uint8Array
const blob = new Blob([value]);
const url = URL.createObjectURL(blob);
setTimeout(() => URL.revokeObjectURL(url), 1000);
port.postMessage({
done,
value: url,
i: i++
});
if (done) {
break;
}
}
}
});
Thanks for all the code and suggestions. I've implemented streaming support in 4.14.6142, but a litter different than suggested.
GM.xmlhttpRequest [...] names like XMLHttpRequest, but API is not the same.
But this is also a benefit. There already is a way to make requests and I don't like to introduce another one.
That's why GM_xhr now gets responseType
"stream" support. The response
object then is a ReadableStream
object. You can access it in onload, but it makes more sense to work with it inside a onloadstarted
event listener to actually really use the streaming capabilities.
You can check streaming support this way:
console.log('streaming ' + (GM_xmlhttpRequest.RESPONSE_TYPE_STREAM === 'stream' ? 'supported' : 'not supported');
Example:
GM_xmlhttpRequest({
method: 'GET',
url: 'http://ipv4.download.thinkbroadband.com/100MB.zip?t=' + Date.now(),
responseType: 'stream',
onloadstart: async function(r) {
if (r.readyState == 4 && r.status == 200) {
const reader = r.response.getReader();
while (true) {
const { done, value} = await reader.read(); // value is Uint8Array
if (value) {
console.log(value.length, 'received')
}
if (done) break;
}
console.log('done');
}
}
});
I just have checked it now.
GM.xmlHttpRequest.RESPONSE_TYPE_STREAM
is undefined
.onloadstart
event does not fires on 404
and similar codes.onreadystatechange
, onload
for 404
code (and similar ones) have zero response (zero bytes in ReadableStream
).onloadstart
's response
has readyState === 4
, but it's fake, it should have 2
(HEADERS_RECEIVED
). Also it looks very strange if you did it specially. It misleads people about when the event fires.onreadystatechange
triggers only on readyState === 4
when the data was received fully in the background script. onreadystatechange
should at least also triggers on HEADERS_RECEIVED
event (with the response
(ReadableStream
) property).ReadableStream
should contain the response data for any status
code, so if (r.readyState == 4 && r.status == 200)
in your example is unnecessary.
How to pass fetch
's init object?
{
headers: new Headers(),
cache: "only-if-cached"
}
Does it support signal
, body
and headers
as Headers?
I still think that implementing of GM.fetch
(compatible with native fetch
) is a better idea that overengineering of GM.xmlHttpRequest
that makes it difficult maintaining for you (and other a userscript manager extension developers), and using for script makers.
Also it would be better for popularisation of this (stream) approach. And for ease of use. Significantly.
It's enough to hear once about GM.fetch
and people will start to use it (if it will be implemented), because they know how to use it and it's a much more convenient thing.
GM.xmlHttpRequest
is probably the most inconvenient API for HTTP requests.
There is XMLHttpRequest
, but no one use it, people uses either a wrapper for it: jQuery.ajax
or axios
, or native fetch
.
GM.xmlHttpRequest
has "XMLHttpRequest-like API", but its API is not compatible with XMLHttpRequest
, that makes it even worst. Adding of new parameters only complicates it.
@tophf, @gera2ld that do you think about my suggestion about GM.fetch and about fetch: true
/responseType: "stream"
of Tampermonkey?
I share derjanb's preference for GM_xmlhttpRequest in https://github.com/Tampermonkey/tampermonkey/issues/1278#issuecomment-884363078.
As for Violentmonkey, it already uses chunked transfers via Blob
internally to speed up such huge downloads significantly so I agree it can wrap them as a stream but further discussion should be in Violentmonkey's repository, not here.
Regarding the examples above that use Blob+createObjectURL, note that in our tests there were cases where it didn't work: 1) in Firefox the content scripts can't fetch extension's own blob on sites with a strict CSP and 2) in incognito mode both in Chrome and Firefox. In these cases Violentmonkey uses a much slower serialization to a JSON-compatible value in the background script, then deserializes it in the content script.
Wait, it's not possible to pass the init object? The most part of the properties of the init object just a string.
I only found a mention of nocache: true,
that is (as I understand) {cache: "no-store"}
.
https://github.com/Tampermonkey/tampermonkey/issues/1003#issuecomment-660072659
That's all the mode fetch: true
was added for?
I expected something like fetchInit
property in additional to fetch: true
.
That's all the mode fetch: true was added for?
Initially fetch was added to support anonymous GM_xhr at Chrome while FF already had mozAnon
support. The existing fetch: true
only enforces the request type used in background. Other options might change this as well (e.g. details.anyonymous
and now details.responseType: "stream"
).
I expected something like fetchInit property in additional to fetch: true. How to pass fetch's init object?
Users should ideally not care about the way the data is retrieved at the background, but simply use GM_xhr's features.
Does it support signal, body and headers as Headers?
No. You can abort the request via the returned abort function -> no need for signal. Also details.data
is used instead of body
and details.headers
is a plain key value object.
All other issues (1. - 5.) should be fixed at 4.14.6143 (in review|crx)
I did not test it yet.
About fetch init object:
Okay, but there is fetch: true
that indicates the way how the data is retrieved in the background.
So I see nothing criminal to modify this field to the simplified version of fetch init object.
It will be compatible with the existed code because any object is true
. And it's very simple to implement.
Just use this object as a base of Object.assign
in the background script when you pass your fetch init object.
fetch(url, props);
->
if (typeof fetchInit === "object" && fetchInit !== null) {
props = Object.assign(fetchInit, props); // fetchInit is passed by a user
}
fetch(url, props);
Just allow to be fetch
property of GM.xmlhttpRequest
to be
{
referrer: "", // optionally, since the extension allow to set headers in more advanced way
referrerPolicy: "no-referrer-when-downgrade",
mode: "cors",
credentials: "include",
cache: "force-cache",
redirect: "follow",
integrity: "sha256-BpfBw7ivV8q2jLiT13fxDYAe2tJllusRSZ273h2nFSE=",
keepalive: true,
}
object. All these properties are string
(and boolean
the last one).
It is the last thing that I need for writing of a normal wrapper over GM.xmlhttpRequest
to emulate GM.fetch
API.
Just allow to be fetch property of GM.xmlhttpRequest to be [...]
This is not going to happen. Sorry.
First, this approach would add a lot of inconsistencies and complexity. What should GM_xhr do if fetch init wants to use the cache, but nocache
is set was well? And second the safari app extension for example has no background fetch support. All GM_xhr functionality is implemented using native Swift code.
So if you really miss some fetch functionality that can't be workarounded or implemented client-wise, then please report the use case as a new issue and I may add this functionality to GM_xhr, but probably in a way that works regardless of the background request API used, if possible.
~1.~ onloadstart
now fires on readyState === 1
, ~but it should fire on readyState === 2
(HEADERS_RECEIVED
), in order to have access to headers
, status
and statusText
.~
response
in onreadystatechange
has missed status
(it's 0
) (and possibly statusText
) on readyState === 2
~as well as in onloadstart
event~. Because of this I can't construct the correct Response(rs, {headers, status, statusText})
. status
will always be 200
— it's the default value for Response
.Here is the code:
// ==UserScript==
// @name Mode stream fetch
// @namespace http://tampermonkey.net/
// @description test
// @version 0.2
// @match https://example.com/*
// @grant GM_xmlhttpRequest
// @connect ipv4.download.thinkbroadband.com
// @connect giant.gfycat.com
// @connect example.com
// ==/UserScript==
let url = "http://ipv4.download.thinkbroadband.com/10MB.zip?t=" + Date.now(); // 408
url = "https://giant.gfycat.com/ShockedSecondaryFiddlercrab.mp4"; // 200 // 32 MB
// url = "https://example.com/xxx"; // 404
// -----------------
// Using:
(async function() {
const response = await fetch(url);
console.log(response);
const {status, statusText} = response;
const lastModified = response.headers.get("last-modified");
// BUG: status is ALWAYS 200 (default)
console.log({status, statusText, lastModified});
const blob = await response.blob();
console.log(blob);
downloadBlob(blob, "x.mp4", url);
})();
// -----------------
async function fetch(url, fetchInit = {}) {
const defaultFetchInit = {method: "get"};
const {headers, method} = {...defaultFetchInit, ...fetchInit};
const isStreamSupported = GM_xmlhttpRequest?.RESPONSE_TYPE_STREAM;
const HEADERS_RECEIVED = 2;
if (!isStreamSupported) {
return new Promise((resolve, _reject) => {
const blobPromise = new Promise((resolve, reject) => {
GM_xmlhttpRequest({
url,
method,
headers,
responseType: "blob",
onload: (response) => resolve(response.response),
onerror: reject,
onreadystatechange: onHeadersReceived
});
});
blobPromise.catch(_reject);
function onHeadersReceived(gmResponse) {
const {
readyState, responseHeaders, status, statusText
} = gmResponse;
if (readyState === HEADERS_RECEIVED) {
const headers = parseHeaders(responseHeaders);
resolve({
headers,
status,
statusText,
arrayBuffer: () => blobPromise.then(blob => blob.arrayBuffer()),
blob: () => blobPromise,
json: () => blobPromise.then(blob => blob.text()).then(text => JSON.parse(text)),
text: () => blobPromise.then(blob => blob.text()),
});
}
}
});
} else {
return new Promise((resolve, _reject) => {
const responsePromise = new Promise((resolve, reject) => {
void GM_xmlhttpRequest({
url,
method,
headers,
responseType: "stream",
onerror: reject,
onreadystatechange: onHeadersReceived,
onloadstart: (gmResponse) => console.log("[onloadstart]", gmResponse) // debug
});
});
responsePromise.catch(_reject);
function onHeadersReceived(gmResponse) {
console.log("[onreadystatechange]", gmResponse); // debug
const {
readyState, responseHeaders, status, statusText, response: readableStream
} = gmResponse;
if (readyState === HEADERS_RECEIVED) {
const headers = parseHeaders(responseHeaders);
let newResp;
if (status === 0) {
console.warn("status is 0!", {status, statusText});
newResp = new Response(readableStream, {headers, /*status, statusText*/});
} else {
newResp = new Response(readableStream, {headers, status, statusText});
}
resolve(newResp);
}
}
});
}
}
function downloadBlob(blob, name, url = "") {
const anchor = document.createElement("a");
anchor.setAttribute("download", name || "");
const blobUrl = URL.createObjectURL(blob);
anchor.href = blobUrl + "#" + url;
anchor.click();
setTimeout(() => URL.revokeObjectURL(blobUrl), 0);
}
function parseHeaders(headersString) {
class Headers {
get(key) {
return this[key.toLowerCase()];
}
}
const headers = new Headers();
for (const line of headersString.trim().split("\n")) {
const [key, ...valueParts] = line.split(":"); // last-modified: Fri, 21 May 2021 14:46:56 GMT
if (key) {
headers[key.trim().toLowerCase()] = valueParts.join(":").trim();
}
}
return headers;
}
onloadstart now fires on readyState === 1
This is when the loadstart event is fired.
response in onreadystatechange has missed status (it's 0) (and possibly statusText) on readyState === 2 as well as in onloadstart event
Good catch. Will be fixed at the next BETA version.
I have checked the code above on a Virtual Machine (Windows) with 2 GB of memory (OS uses 1.2 GB of committed memory, 1.5 GB with the browser) with the disabled Virtual memory file (Paging file):
Yes, as I expected, the code with streaming is much memory optimised. Notable!
With the default approach I can download only 100 MB file, while with ReadableStream
(responseType: "stream"
) I can download 300 MB file.
On VM with 2.5 GB of memory with XHR
I can download 200 MB file, but not 300 MB.
To download 300 MB file (without streaming) I need to set 2.8 GB of the memory! (In other case the browser closes/nothing happens.)
As I said above the theoretical memory optimisation is double, and practical one is even more.
Note: I restarted the browser after each downloading. So it was tested properly.
...for the temporal storing of blobs by a browser.
Also in the main OS (where there is enough of the memory), writing of the blob data to SSD (in blob_storage
folder) is twice less!
There is no blob flushing to blob_storage
folder on the data exchange between the background script and the content script with streaming.
To download a 200 MB file it requires to write ~400 MB with XHR
, and only ~200 MB with fetch
in blob_storage
folder.
With fetch
the browser just keeps in memory the chucks while the extension transmits them from the background script to the content script.
With XHR
the browser must exchange the entire 200 MB blob (even while you exchange blob:URL), so the browser flushes (pages) it in the hard drive, since it is too large (more than ~10 MB).
In both cases additional blob is created when I download it in downloadBlob
function. (200 MB)
Data written to SSD (exchange + downloading):
fetch
: 0 + 200 MBXHR
: 200 MB + 200 MBAs for Violentmonkey, it already uses chunked transfers via
Blob
internally to speed up such huge downloads significantly so I agree it can wrap them as a stream but further discussion should be in Violentmonkey's repository, not here.Regarding the examples above that use Blob+createObjectURL, note that in our tests there were cases where it didn't work: 1) in Firefox the content scripts can't fetch extension's own blob on sites with a strict CSP and 2) in incognito mode both in Chrome and Firefox. In these cases Violentmonkey uses a much slower serialization to a JSON-compatible value in the background script, then deserializes it in the content script.
Do both Violentmonkey and Tampermonkey still use this serialistion method in incognito mode?
Implement
GM.fetch
/GM_Fetch
I have already talked about
GM.xmlhttpRequest
andfetch
here: https://github.com/Tampermonkey/tampermonkey/issues/1050 But it makes sense to write this separate issue, since there was talking aboutfetch: true
option ofGM.xmlhttpRequest
, but this issue is about a new function.Tampermonkey currently has
fetch: true
as a Beta feature, but that looks no very useful feature since it usesfetch
just only in the background script, the other moments ofGM.xmlhttpRequest
work in the usual way.I suggest to add a new
GM.fetch
function that will be written from zero, has the same API asfetch
and use the benefist of streaming, which impossible to do withXMLHttpRequest
.Compatible with Fetch API
Like
GM.xmlhttpRequest
it will allow to bypass CORS limitation, but API ofGM.fetch
should be compatible with the common Fetch API.One of annoying things of
GM.xmlhttpRequest
when you use it the first times is that while it names likeXMLHttpRequest
, but API is not the same. API ofGM.fetch
should be compatible with the common Fetch API. It will make writing of script to easier since you do not use some other API, and you can reuse the existing code that was written forfetch
.So, using any existing code that written for
fetch
should work withGM.fetch
transparently. For example wrappingReadableStream
(response
) for progress indication:The example
```js let response = await fetch(url); response = loadingProxyResponse(response, onLoadingProgressLog); // ... // just logs the loading progress function onLoadingProgressLog(receivedLength, contentLength) { const percentage = receivedLength / contentLength; const text = (percentage * 100 + "").substring(0, 4) + " %"; console.log(text); } function loadingProxyResponse(response, onProgress) { const contentLength = parseInt(response.headers.get("Content-Length")); let receivedLength = 0; const reader = response.body.getReader(); const readableStream = new ReadableStream({ async start(controller) { while (true) { const {done, value} = await reader.read(); // value is Uint8Array if (done) { break; } receivedLength += value.length; try { onProgress(receivedLength, contentLength); } catch (e) { console.error("onProgress error:", e); } controller.enqueue(value); } controller.close(); reader.releaseLock(); }, cancel() { reader.cancel(); } }); return new Response(readableStream, {headers: response.headers}); } ```Streaming
Okay, compatibility with Fetch API is good, but it not the main thing that I want. You can write some wrapper for
GM.xmlhttpRequest
that will look likefetch
even now. But with just a wrapping you can't do the main feature offetch
— it's streaming.The streaming is the core difference between
XMLHttpRequest
andfetch
. When you download some data withXMLHttpRequest
the entire data stores in the memory until the request is finished and data is handled. In case of a userscript manager extension the consuming of memory is double. Since the data stores both in content/web script and in the background script. But when you usefetch
you can consume data from the stream as soon as the data received and if you do not need the data anymore the garbage collector can destroy them.In case of TamperMonkey you do not need to store the downloaded data in the background script's context. You just pass them to context script as they are received and GC collect them. So it already notable reduce memory consuming — two times. Also with streaming you can use use the such approach: https://github.com/jimmywarting/StreamSaver.js In this case there no any washing of the memory for storing data at all when you download files. Memory consuming is O(1) for any file of any size.
So this will make userscripts more memory optimised. Notable optimised.
Also I think it would more easier to implement (and support it later) a new function from zero, than trying to add the
fetch
's functional to the existingGM.xmlhttpRequest
as a mode (fetch: true
) which you currently do. The same thing would actual for other userscript managers.