WICG / proposals

A home for well-formed proposed incubations for the web platform. All proposals welcome.
https://wicg.io/
Other
229 stars 14 forks source link

Web Instruction Set (WISE) #48

Open AdamSobieski opened 2 years ago

AdamSobieski commented 2 years ago

Introduction

What if we could compile Web content, XML and HTML5 documents, into Web instructions for faster transmission, decoding, and page loads?

What if these Web instructions could support functionalities from the DOM and also other APIs such as the CSSOM, Custom Elements, CSS Animations, WebAssembly, Canvas, and WebGPU?

Example

Given the following XML document:

 <?xml version="1.0" encoding="UTF-8"?>
 <DocumentElement param="value">
     <FirstElement>
         &#xb6; Some Text
     </FirstElement>
     <?some_pi some_attr="some_value"?>
     <SecondElement param2="something">
         Pre-Text <Inline>Inlined text</Inline> Post-text.
     </SecondElement>
</DocumentElement>

when it is passed through a SAX parser, it will generate a sequence of events resembling the following:

  1. XML Element start, named DocumentElement, with an attribute param equal to value
  2. XML Element start, named FirstElement
  3. XML Text node, with data equal to &#xb6; Some Text
  4. XML Element end, named FirstElement
  5. Processing Instruction event, with the target some_pi and data some_attr equal to some_value
  6. XML Element start, named SecondElement, with an attribute param2 equal to something
  7. XML Text node, with data equal to Pre-Text
  8. XML Element start, named Inline
  9. XML Text node, with data equal to Inlined text
  10. XML Element end, named Inline
  11. XML Text node, with data equal to Post-text
  12. XML Element end, named SecondElement
  13. XML Element end, named DocumentElement

Another sequence could be based on the DOM JS API, resembling the following:

  1. var el1 = document.createElement('DocumentElement');
  2. el1.setAttribute('param', 'value');
  3. var el2 = document.createElement('FirstElement');
  4. el2.textContent = "&#xb6; Some Text";
  5. var el3 = document.createProcessingInstruction('some_pi', 'some_attr="some_value"');
  6. var el4 = document.createElement('SecondElement');
  7. el4.setAttribute('param2', 'something');
  8. var txt1 = document.createTextNode('Pre-Text ');
  9. var el5 = document.createElement('Inline');
  10. el5.textContent = 'Inlined text';
  11. var txt2 = document.createTextNode(' Post-text.');
  12. el4.appendChild(txt1);
  13. el4.appendChild(el5);
  14. el4.appendChild(txt2);
  15. el1.appendChild(el2);
  16. el1.appendChild(el3);
  17. el1.appendChild(el4);
  18. document.append(el1);

A Stack-based Virtual Machine

Considering a stack-based virtual machine with the capabilities for one or more local variables, using a string table, creating an appendChild2() extension method on Element which returns the appended-to node instead of the appended node, a sequence of instructions might resemble:

  1. var text = ['DocumentElement', 'param', 'value', 'FirstElement', '&#xb6; Some Text', 'some_pi', 'some_attr="some_value"', 'SecondElement', 'param2', 'something', 'Inline', 'Inlined text', 'Pre-Text ', ' Post-text.'];
  2. var local = [null];
  3. stack.push(document.createElement(text[0]));
  4. stack.top.setAttribute(text[1], text[2]);
  5. stack.push(document.createElement(text[3]));
  6. stack.top.textContent = text[4];
  7. stack.push(document.createProcessingInstruction(text[5], text[6]));
  8. stack.push(document.createElement(text[7]));
  9. stack.top.setAttribute(text[8], text[9]);
  10. stack.push(document.createElement(text[10]));
  11. stack.top.textContent = text[11];
  12. local[0] = stack.pop();
  13. stack.top.appendChild(document.createTextNode(text[12]);
  14. stack.top.appendChild(local[0]);
  15. stack.top.appendChild(document.createTextNode(text[13]);
  16. stack.reverse();
  17. stack.push(stack.pop().appendChild2(stack.pop()));
  18. stack.push(stack.pop().appendChild2(stack.pop()));
  19. stack.push(stack.pop().appendChild2(stack.pop()));
  20. document.append(stack.pop());

Which, towards a binary serialization of virtual machine instructions, might resemble:

  1. SPDCE(0)
  2. STSA(1, 2)
  3. SPDCE(3)
  4. STTC(4)
  5. SPDCPI(5, 6)
  6. SPDCE(7)
  7. STSA(8, 9)
  8. SPDCE(10)
  9. STTC(11)
  10. SETLP(0)
  11. STACDCTN(12)
  12. STACL(0)
  13. STACDCTN(13)
  14. SREV()
  15. SPSPAC2SP()
  16. SPSPAC2SP()
  17. SPSPAC2SP()
  18. DASP()

Or, perhaps, might resemble:

  1. LDTEXT.0
  2. CALL DOCUMENT_CREATEELEMENT
  3. DUP
  4. LDTEXT.1
  5. LDTEXT.2
  6. CALL ELEMENT_SETATTRIBUTE
  7. LDTEXT.3
  8. CALL DOCUMENT_CREATEELEMENT
  9. DUP
  10. LDTEXT.4
  11. CALL ELEMENT_SETTEXTCONTENT
  12. LDTEXT.5
  13. LDTEXT.6
  14. CALL DOCUMENT_CREATEPROCESSINGINSTRUCTION
  15. LDTEXT.7
  16. CALL DOCUMENT_CREATEELEMENT
  17. DUP
  18. LDTEXT.8
  19. LDTEXT.9
  20. CALL ELEMENT_SETATTRIBUTE
  21. ...

A Registers-based Virtual Machine

Next, considering a registers-based virtual machine with a list of registers, r, a sequence of instructions might resemble:

  1. var text = ['DocumentElement', 'param', 'value', 'FirstElement', '&#xb6; Some Text', 'some_pi', 'some_attr="some_value"', 'SecondElement', 'param2', 'something', 'Inline', 'Inlined text', 'Pre-Text ', ' Post-text.'];
  2. r[0] = document.createElement(text[0]);
  3. r[0].setAttribute(text[1], text[2]);
  4. r[1] = document.createElement(text[3]);
  5. r[1].textContent = text[4];
  6. r[2] = document.createProcessingInstruction(text[5], text[6]);
  7. r[3] = document.createElement(text[7]);
  8. r[3].setAttribute(text[8], text[9]);
  9. r[4] = document.createElement(text[10]);
  10. r[4].textContent = text[11];
  11. r[5] = document.createTextNode(text[12]);
  12. r[6] = document.createTextNode(text[13]);
  13. r[3].appendChild(r[5]);
  14. r[3].appendChild(r[4]);
  15. r[3].appendChild(r[6]);
  16. r[0].appendChild(r[1]);
  17. r[0].appendChild(r[2]);
  18. r[0].appendChild(r[3]);
  19. document.append(r[0]);

Which, towards a binary serialization of virtual machine instructions, might resemble:

  1. DCE(0, 0)
  2. SA(0, 1, 2)
  3. DCE(1, 3)
  4. TC(1, 4)
  5. DCPI(2, 5, 6)
  6. DCE(3, 7)
  7. SA(3, 8, 9)
  8. DCE(4, 10)
  9. TC(4, 11)
  10. DCTN(5, 12)
  11. DCTN(6, 13)
  12. AC(3, 5)
  13. AC(3, 4)
  14. AC(3, 6)
  15. AC(0, 1)
  16. AC(0, 2)
  17. AC(0, 3)
  18. DA(0)

Considered Uses

Web instructions would have multiple uses, including, but not limited to:

  1. Static Web content could be compiled into instructions for more efficient storage, transmission, and reconstruction.
  2. Dynamic Web content could be streamed using these instructions. Streams of Web instructions needn't conclude upon the loading and presentation of initial Web content. Streams could continue, providing dynamic and unfolding instructions including in response to user-input events and navigation.
  3. Web synchronization and cobrowsing scenarios.
  4. Tracks of Web instructions could enable other interactive hypervideo scenarios.

Optimizations

Potential optimizations include that well-known element names and attribute names, e.g., those of HTML5, could have reserved indices in the text array and would not need to be stored or transmitted.

Discussion

Any thoughts on these ideas? Is there any interest in incubating a Web Instruction Set?

reillyeon commented 2 years ago

Can you elaborate more on the use cases and how this would fit into the existing web ecosystem?

AdamSobieski commented 2 years ago

@reillyeon, hello. This proposed technology can be described as having multiple uses. It is built atop Web ecosystem components, utilizing the: DOM, CSSOM, CSS animations, WebAssembly, and WebGPU API's.

Uses of a Web Instruction Set include more efficiently transmitting Web content, e.g., in multimedia stream tracks.

It can be described as an efficient "serialization" of Web content.

An encoder could be described as "compiling" Web content for more efficient transmission, processing, and decoding. Such "compiling" could be performed either a priori or in real-time. An encoder could reference external resources in resultant hypertext content and could also include internal resources and multimedia, e.g., images, audio, video, in other tracks of envisioned stream envelopes. In theory, multimedia resources could be streamed to arrive concurrently with such a Web track.

A decoder could be described as a "virtual machine" which processes Web Streaming Protocol instructions to (re)construct Web content, e.g., hypertext documents.

Use cases that I am thus far considering – and please let me know if others come to mind – include:

  1. More efficient transmission, processing, and layout of Web content, including attached multimedia resources.
  2. Persisted such streams which could be stored, saved to disk, transferred between systems, loaded, and consumed.
  3. Live-streaming, streaming, and persisted hypervideo. Dynamic Web content could be placed atop streaming video content. A video stream could include a Web track for decoding so as to dynamically place and move layout boxes with hypertext, e.g., hyperlinks, atop video content.
  4. Synchronizing layout engines. This could be useful for screen capture in WebRTC scenarios, transmitting Web Streaming Protocol instructions for layout engine synchronization (in a manner perhaps resembling remote desktop protocols) instead of transmitting video of rendered content.
mikestaub commented 2 years ago

I don't understand why this is needed? Can't we just use custom data structures over WSS?

AdamSobieski commented 2 years ago

@mikestaub, hello. If I understand your questions, developers could implement their own protocols, instruction sets, encoders, decoders, and utilize WSS for some of the indicated use-case scenarios. Developers could transmit a small HTML file with some compressed JavaScript to open a WSS socket and then utilize a custom protocol and perhaps instruction set to provide a dynamic user interface.

Towards implementing Web synchronization scenarios, developers could utilize a mutation observer, serialize mutation events to JSON, stream those data structures using WSS, and then write a decoder which, on another machine, would mutate another Web document or UI based on the received events.

A Web Instruction Set, resembling the sketch or the proof of concept, above, would, in addition to optimizing transmission, processing, and page-load times for Web content, allow Web tracks to be put into live-streaming, streaming, and persisted (files) video envelopes, effectively delivering interactive hypervideo.

So, while developers can continue to create custom solutions, protocols, and instruction sets, by coming together, incubating this project, collaborating, discovering best practices, and standardizing them, we can deliver natively-implemented encoders and decoders in Web browsers, perhaps assembly-code-optimized for some functionalities, which would amplify the performance boosts from these approaches in terms of transmission, processing, and page-load times, benefitting Web developers and end-users.

AdamSobieski commented 2 years ago

I updated the issue title and renamed the proposed technology to Web Instruction Set for clarity, as this proposal is not bound to any specific communication protocol. Please feel free to let me know if any other interesting names for the proposed technology come to mind.

Looking forward to discussing Web instruction sets, compiling Web content (a priori and real-time), encoding, virtual machines, decoding, Web streaming, Web tracks, referencing contents from other tracks in multimedia envelopes and containers, interactive hypervideo (live-streaming, streaming, and persisted), and related topics.

tidoust commented 2 years ago

Towards implementing Web synchronization scenarios, developers could utilize a mutation observer, serialize mutation events to JSON, stream those data structures using WSS, and then write a decoder which, on another machine, would mutate another Web document or UI based on the received events.

This reminds me of the talk on Distributed multi-party media-rich content review that @osidorkin delivered during the W3C/SMPTE Workshop on Professional Media Production on the Web. Oleg's talk is not focused on the actual protocol used to stream events but describes the use of mutation observers and the like, highlighting some of the challenges that arise such as handling of the <canvas> element.

AdamSobieski commented 2 years ago

Thanks @tidoust, that was an interesting talk. The <canvas> element is indeed relevant for Web synchronization, or cobrowsing, scenarios.

Regarding the particulars of protocols and instruction sets, in particular for the cases of transmitting compiled Web documents or user interfaces, also recently thinking about utilizing multiple threads, or tasks, on client decoders for additional speedup.

To a concrete example, we can consider transmitting a compiled Web document, a document with one <article> element which contains multiple <section> elements. We can consider how a client might utilize multiple threads, or tasks, ideally on multiple cores, to process instructions to (re)construct the <section> elements, in parallel, to then merge them into the <article> element.

So, thinking about both single-threaded and multi-threaded approaches and possibilities.

Others have previously considered these and related topics. For instance:

Meyerovich, Leo A., and Rastislav Bodik. "Fast and parallel webpage layout." In Proceedings of the 19th international conference on World wide web, pp. 711-720. 2010.

AdamSobieski commented 2 years ago

I updated the initial post per our unfolding discussion. Please let me know if you have any more questions or comments towards improving it further.

travisleithead commented 2 years ago

Interesting greenfield idea!

So... this replaces JavaScript? Is there any client-side programming logic possible, or is everything driven from the WISE stream?

I guess you'll want to build a browser based on WISE instructions and then compare the performance and capabilities to vanilla XML/HTML transport and rendering pipelines of today's WebKit/Gecko/Blink powerhouses?

While interesting, this strikes me as quite a revolutionary idea (rather than an evolutionary idea), and thus will face a strong headwind for prototyping/adoption.

AdamSobieski commented 2 years ago

@travisleithead, thank you for taking a look at the ideas!

To your question, a Web Instruction Set (WISE) has a number of foreseeable use cases. For some uses, such as hypervideo, WISE can be described as an instruction set which utilizes functionality which was previously only available from JavaScript. For the use case of transmitting Web content, WISE also does not replace JavaScript.

With respect to the hypervideo use case, as envisioned, a WISE stream could be utilized to perform some DOM / CSSOM / CSS Animation functionalities such as creating, modifying (animating), and deleting Web content (e.g., layout boxes and hyperlinks) atop video background content. In this use case, a WISE stream could be a track in a hypervideo resource and this approach appears to work for live-streaming, streaming, and file-based hypervideo scenarios.

With respect to transmitting Web content, I am optimistic about the ideas of “compiling” Web content into a new format to obtain efficiencies and speedups in terms of both space and time. This “compiling” process could utilize a Web browser to analyze page-load dynamics. Also, it is noteworthy that the WISE approach is parallelizable.

Interestingly, the outputs from such a “compiling” process could also produce server-side scripting resources (“.http3”) with which to better utilize HTTP/3 server features such as server pushes with prioritizations. That is, a Web server could retrieve a Web resource ("index.html") and retrieve a corresponding server-side script ("index.html.http3") with which to better utilize HTTP/3 features for the Web resource.

So, WISE doesn't intend to replace JavaScript and doesn't require a new approach to the design of Web browsers. It can be described as a new functionality which can be added to existing Web browsers. Both approaches for transmitting Web content would leverage existing WebKit / Gecko / Blink engines and the current XML-based transmission approaches are very efficiently implemented.

Some work is required to formulate the precise instruction set for prototyping and measurement. Perhaps an initial subset of the envisioned instruction set, a subset specific to the Web content transmission use case, a subset specific to the DOM API, could be formulated first, utilized to transmit a document, and to measure its performance?

cconcolato commented 2 years ago

This reminded me of https://www.w3.org/TR/rex/

AdamSobieski commented 2 years ago

Thanks @cconcolato, I hadn't seen that.

AdamSobieski commented 4 months ago

Thanks @yoavweiss for updating this issue.

Looking back on these ideas from two years ago, I could have provided better examples to inspire some brainstorming. So, I updated the proposal with more examples showing how XML trees can be represented, stored, and transmitted as sequences of steps, sequences of operations for virtual machines.

Reading this proposal again, I'm thinking about the sizes of hypothetical binary instruction streams versus string-based representations of XML trees. Potential optimizations, in these regards, include that well-known element names and attribute names, e.g., those of HTML5, could have reserved indices in the text array and would not need to be stored or transmitted.