Open AdamSobieski opened 2 years ago
Can you elaborate more on the use cases and how this would fit into the existing web ecosystem?
@reillyeon, hello. This proposed technology can be described as having multiple uses. It is built atop Web ecosystem components, utilizing the: DOM, CSSOM, CSS animations, WebAssembly, and WebGPU API's.
Uses of a Web Instruction Set include more efficiently transmitting Web content, e.g., in multimedia stream tracks.
It can be described as an efficient "serialization" of Web content.
An encoder could be described as "compiling" Web content for more efficient transmission, processing, and decoding. Such "compiling" could be performed either a priori or in real-time. An encoder could reference external resources in resultant hypertext content and could also include internal resources and multimedia, e.g., images, audio, video, in other tracks of envisioned stream envelopes. In theory, multimedia resources could be streamed to arrive concurrently with such a Web track.
A decoder could be described as a "virtual machine" which processes Web Streaming Protocol instructions to (re)construct Web content, e.g., hypertext documents.
Use cases that I am thus far considering – and please let me know if others come to mind – include:
I don't understand why this is needed? Can't we just use custom data structures over WSS?
@mikestaub, hello. If I understand your questions, developers could implement their own protocols, instruction sets, encoders, decoders, and utilize WSS for some of the indicated use-case scenarios. Developers could transmit a small HTML file with some compressed JavaScript to open a WSS socket and then utilize a custom protocol and perhaps instruction set to provide a dynamic user interface.
Towards implementing Web synchronization scenarios, developers could utilize a mutation observer, serialize mutation events to JSON, stream those data structures using WSS, and then write a decoder which, on another machine, would mutate another Web document or UI based on the received events.
A Web Instruction Set, resembling the sketch or the proof of concept, above, would, in addition to optimizing transmission, processing, and page-load times for Web content, allow Web tracks to be put into live-streaming, streaming, and persisted (files) video envelopes, effectively delivering interactive hypervideo.
So, while developers can continue to create custom solutions, protocols, and instruction sets, by coming together, incubating this project, collaborating, discovering best practices, and standardizing them, we can deliver natively-implemented encoders and decoders in Web browsers, perhaps assembly-code-optimized for some functionalities, which would amplify the performance boosts from these approaches in terms of transmission, processing, and page-load times, benefitting Web developers and end-users.
I updated the issue title and renamed the proposed technology to Web Instruction Set for clarity, as this proposal is not bound to any specific communication protocol. Please feel free to let me know if any other interesting names for the proposed technology come to mind.
Looking forward to discussing Web instruction sets, compiling Web content (a priori and real-time), encoding, virtual machines, decoding, Web streaming, Web tracks, referencing contents from other tracks in multimedia envelopes and containers, interactive hypervideo (live-streaming, streaming, and persisted), and related topics.
Towards implementing Web synchronization scenarios, developers could utilize a mutation observer, serialize mutation events to JSON, stream those data structures using WSS, and then write a decoder which, on another machine, would mutate another Web document or UI based on the received events.
This reminds me of the talk on Distributed multi-party media-rich content review that @osidorkin delivered during the W3C/SMPTE Workshop on Professional Media Production on the Web. Oleg's talk is not focused on the actual protocol used to stream events but describes the use of mutation observers and the like, highlighting some of the challenges that arise such as handling of the <canvas>
element.
Thanks @tidoust, that was an interesting talk. The <canvas>
element is indeed relevant for Web synchronization, or cobrowsing, scenarios.
Regarding the particulars of protocols and instruction sets, in particular for the cases of transmitting compiled Web documents or user interfaces, also recently thinking about utilizing multiple threads, or tasks, on client decoders for additional speedup.
To a concrete example, we can consider transmitting a compiled Web document, a document with one <article>
element which contains multiple <section>
elements. We can consider how a client might utilize multiple threads, or tasks, ideally on multiple cores, to process instructions to (re)construct the <section>
elements, in parallel, to then merge them into the <article>
element.
So, thinking about both single-threaded and multi-threaded approaches and possibilities.
Others have previously considered these and related topics. For instance:
Meyerovich, Leo A., and Rastislav Bodik. "Fast and parallel webpage layout." In Proceedings of the 19th international conference on World wide web, pp. 711-720. 2010.
I updated the initial post per our unfolding discussion. Please let me know if you have any more questions or comments towards improving it further.
Interesting greenfield idea!
So... this replaces JavaScript? Is there any client-side programming logic possible, or is everything driven from the WISE stream?
I guess you'll want to build a browser based on WISE instructions and then compare the performance and capabilities to vanilla XML/HTML transport and rendering pipelines of today's WebKit/Gecko/Blink powerhouses?
While interesting, this strikes me as quite a revolutionary idea (rather than an evolutionary idea), and thus will face a strong headwind for prototyping/adoption.
@travisleithead, thank you for taking a look at the ideas!
To your question, a Web Instruction Set (WISE) has a number of foreseeable use cases. For some uses, such as hypervideo, WISE can be described as an instruction set which utilizes functionality which was previously only available from JavaScript. For the use case of transmitting Web content, WISE also does not replace JavaScript.
With respect to the hypervideo use case, as envisioned, a WISE stream could be utilized to perform some DOM / CSSOM / CSS Animation functionalities such as creating, modifying (animating), and deleting Web content (e.g., layout boxes and hyperlinks) atop video background content. In this use case, a WISE stream could be a track in a hypervideo resource and this approach appears to work for live-streaming, streaming, and file-based hypervideo scenarios.
With respect to transmitting Web content, I am optimistic about the ideas of “compiling” Web content into a new format to obtain efficiencies and speedups in terms of both space and time. This “compiling” process could utilize a Web browser to analyze page-load dynamics. Also, it is noteworthy that the WISE approach is parallelizable.
Interestingly, the outputs from such a “compiling” process could also produce server-side scripting resources (“.http3”) with which to better utilize HTTP/3 server features such as server pushes with prioritizations. That is, a Web server could retrieve a Web resource ("index.html") and retrieve a corresponding server-side script ("index.html.http3") with which to better utilize HTTP/3 features for the Web resource.
So, WISE doesn't intend to replace JavaScript and doesn't require a new approach to the design of Web browsers. It can be described as a new functionality which can be added to existing Web browsers. Both approaches for transmitting Web content would leverage existing WebKit / Gecko / Blink engines and the current XML-based transmission approaches are very efficiently implemented.
Some work is required to formulate the precise instruction set for prototyping and measurement. Perhaps an initial subset of the envisioned instruction set, a subset specific to the Web content transmission use case, a subset specific to the DOM API, could be formulated first, utilized to transmit a document, and to measure its performance?
This reminded me of https://www.w3.org/TR/rex/
Thanks @cconcolato, I hadn't seen that.
Thanks @yoavweiss for updating this issue.
Looking back on these ideas from two years ago, I could have provided better examples to inspire some brainstorming. So, I updated the proposal with more examples showing how XML trees can be represented, stored, and transmitted as sequences of steps, sequences of operations for virtual machines.
Reading this proposal again, I'm thinking about the sizes of hypothetical binary instruction streams versus string-based representations of XML trees. Potential optimizations, in these regards, include that well-known element names and attribute names, e.g., those of HTML5, could have reserved indices in the text
array and would not need to be stored or transmitted.
Introduction
What if we could compile Web content, XML and HTML5 documents, into Web instructions for faster transmission, decoding, and page loads?
What if these Web instructions could support functionalities from the DOM and also other APIs such as the CSSOM, Custom Elements, CSS Animations, WebAssembly, Canvas, and WebGPU?
Example
Given the following XML document:
when it is passed through a SAX parser, it will generate a sequence of events resembling the following:
DocumentElement
, with an attributeparam
equal tovalue
FirstElement
¶ Some Text
FirstElement
some_pi
and datasome_attr
equal tosome_value
SecondElement
, with an attributeparam2
equal tosomething
Pre-Text
Inline
Inlined text
Inline
Post-text
SecondElement
DocumentElement
Another sequence could be based on the DOM JS API, resembling the following:
var el1 = document.createElement('DocumentElement');
el1.setAttribute('param', 'value');
var el2 = document.createElement('FirstElement');
el2.textContent = "¶ Some Text";
var el3 = document.createProcessingInstruction('some_pi', 'some_attr="some_value"');
var el4 = document.createElement('SecondElement');
el4.setAttribute('param2', 'something');
var txt1 = document.createTextNode('Pre-Text ');
var el5 = document.createElement('Inline');
el5.textContent = 'Inlined text';
var txt2 = document.createTextNode(' Post-text.');
el4.appendChild(txt1);
el4.appendChild(el5);
el4.appendChild(txt2);
el1.appendChild(el2);
el1.appendChild(el3);
el1.appendChild(el4);
document.append(el1);
A Stack-based Virtual Machine
Considering a stack-based virtual machine with the capabilities for one or more local variables, using a string table, creating an
appendChild2()
extension method onElement
which returns the appended-to node instead of the appended node, a sequence of instructions might resemble:var text = ['DocumentElement', 'param', 'value', 'FirstElement', '¶ Some Text', 'some_pi', 'some_attr="some_value"', 'SecondElement', 'param2', 'something', 'Inline', 'Inlined text', 'Pre-Text ', ' Post-text.'];
var local = [null];
stack.push(document.createElement(text[0]));
stack.top.setAttribute(text[1], text[2]);
stack.push(document.createElement(text[3]));
stack.top.textContent = text[4];
stack.push(document.createProcessingInstruction(text[5], text[6]));
stack.push(document.createElement(text[7]));
stack.top.setAttribute(text[8], text[9]);
stack.push(document.createElement(text[10]));
stack.top.textContent = text[11];
local[0] = stack.pop();
stack.top.appendChild(document.createTextNode(text[12]);
stack.top.appendChild(local[0]);
stack.top.appendChild(document.createTextNode(text[13]);
stack.reverse();
stack.push(stack.pop().appendChild2(stack.pop()));
stack.push(stack.pop().appendChild2(stack.pop()));
stack.push(stack.pop().appendChild2(stack.pop()));
document.append(stack.pop());
Which, towards a binary serialization of virtual machine instructions, might resemble:
SPDCE(0)
STSA(1, 2)
SPDCE(3)
STTC(4)
SPDCPI(5, 6)
SPDCE(7)
STSA(8, 9)
SPDCE(10)
STTC(11)
SETLP(0)
STACDCTN(12)
STACL(0)
STACDCTN(13)
SREV()
SPSPAC2SP()
SPSPAC2SP()
SPSPAC2SP()
DASP()
Or, perhaps, might resemble:
LDTEXT.0
CALL DOCUMENT_CREATEELEMENT
DUP
LDTEXT.1
LDTEXT.2
CALL ELEMENT_SETATTRIBUTE
LDTEXT.3
CALL DOCUMENT_CREATEELEMENT
DUP
LDTEXT.4
CALL ELEMENT_SETTEXTCONTENT
LDTEXT.5
LDTEXT.6
CALL DOCUMENT_CREATEPROCESSINGINSTRUCTION
LDTEXT.7
CALL DOCUMENT_CREATEELEMENT
DUP
LDTEXT.8
LDTEXT.9
CALL ELEMENT_SETATTRIBUTE
...
A Registers-based Virtual Machine
Next, considering a registers-based virtual machine with a list of registers,
r
, a sequence of instructions might resemble:var text = ['DocumentElement', 'param', 'value', 'FirstElement', '¶ Some Text', 'some_pi', 'some_attr="some_value"', 'SecondElement', 'param2', 'something', 'Inline', 'Inlined text', 'Pre-Text ', ' Post-text.'];
r[0] = document.createElement(text[0]);
r[0].setAttribute(text[1], text[2]);
r[1] = document.createElement(text[3]);
r[1].textContent = text[4];
r[2] = document.createProcessingInstruction(text[5], text[6]);
r[3] = document.createElement(text[7]);
r[3].setAttribute(text[8], text[9]);
r[4] = document.createElement(text[10]);
r[4].textContent = text[11];
r[5] = document.createTextNode(text[12]);
r[6] = document.createTextNode(text[13]);
r[3].appendChild(r[5]);
r[3].appendChild(r[4]);
r[3].appendChild(r[6]);
r[0].appendChild(r[1]);
r[0].appendChild(r[2]);
r[0].appendChild(r[3]);
document.append(r[0]);
Which, towards a binary serialization of virtual machine instructions, might resemble:
DCE(0, 0)
SA(0, 1, 2)
DCE(1, 3)
TC(1, 4)
DCPI(2, 5, 6)
DCE(3, 7)
SA(3, 8, 9)
DCE(4, 10)
TC(4, 11)
DCTN(5, 12)
DCTN(6, 13)
AC(3, 5)
AC(3, 4)
AC(3, 6)
AC(0, 1)
AC(0, 2)
AC(0, 3)
DA(0)
Considered Uses
Web instructions would have multiple uses, including, but not limited to:
Optimizations
Potential optimizations include that well-known element names and attribute names, e.g., those of HTML5, could have reserved indices in the
text
array and would not need to be stored or transmitted.Discussion
Any thoughts on these ideas? Is there any interest in incubating a Web Instruction Set?