denoland / deno

A modern runtime for JavaScript and TypeScript.
https://deno.com
MIT License
95.83k stars 5.3k forks source link

Provide DOMParser, XMLSerializer, and XSLTProcessor DOM APIs for working with XML #3648

Closed MarkTiedemann closed 4 years ago

MarkTiedemann commented 4 years ago

It would be pretty awesome if I could use Deno for parsing, transforming, and serializing XML without a third-party module, just like I can in the browser:

kitsonk commented 4 years ago

I think it would always be best to leave these heavy-weight things is to get jsdom supportable under Deno.

MarkTiedemann commented 4 years ago

@kitsonk You are right, DOMParser and XMLSerializer are working in JSDOM (they are implemented using https://github.com/jsdom/w3c-xmlserializer and https://github.com/lddubeau/saxes).

XSLTProcessor isn't working but, even though it's supported in all browsers (except IE), it's also non-standard so I guess it's optional, anyways.

MarkTiedemann commented 4 years ago

Just ran into another use case where solid XML parsing would be awesome: Dynamically listing all currently registered media types (by parsing https://www.iana.org/assignments/media-types/media-types.xml).

async function listMediaTypes() {
  let response = await fetch("https://www.iana.org/assignments/media-types/media-types.xml");
  let xml = await response.text();
  let document = new DOMParser().parseFromString(xml, "application/xml");
  let types = [];
  for (let registry of document.querySelectorAll("registry registry")) {
    for (let record of registry.querySelectorAll("record")) {
      let file = record.querySelector("file");
      if (file !== null) {
        types.push(file.textContent);
      } else {
        types.push(`${registry.querySelector("title").textContent}/${record.querySelector("name").textContent}`);
      }
    }
  }
  return types.sort((a, b) => a.toLocaleLowerCase("en").localeCompare(b.toLocaleLowerCase("en")));
}

await listMediaTypes();

This works fine in the browser console, but not in Deno.

danilaplee commented 4 years ago

There is a great library for parsing html(https://docs.rs/select/0.4.3/select/) in rust, so I'am up for implementing this issue as a native DOMParser as per the W3C/WEB-API standard https://developer.mozilla.org/en-US/docs/Web/API/DOMParser

MarkTiedemann commented 4 years ago

utkarshkukreti/select.rs seems to be unstable.

Note: All the API is currently unstable

I think what you are looking for is the parser that they are using internally: servo/html5ever. AFAIK, that's the parser used in Firefox so it should be fairly save to use. :)

danilaplee commented 4 years ago

@MarkTiedemann I've used select before, had no major issues, but html5ever seems more stable 👍 I'll get down to implementation

timmak commented 4 years ago

I don't know if anyone has done anything on this but would this be a plugin or would deno ship with servo/html5ever I would be interest on possible working on this

SRNV commented 4 years ago

someone is working on it? would be great

max-pub commented 4 years ago

Given the stated goal of being compatible with WebAPIs, support of DOMParser, XMLSerializer and XSLTProcessor would be very desirable!

Maybe possible via WASM-compilation of respective Firefox/Chrome - module?

kitsonk commented 4 years ago

To be clear, the goal is use web compatible APIs to provide features where possible, not support every web feature.

Also, there aren't "respective Firefox/Chrome modules" for these type of features and any WASM code has to interface in a asyncronous way to JavaScript which would be totally unsuitable for implementation of these APIs.

As stated above, the best path forward on these would be to look at https://github.com/jsdom/w3c-xmlserializer and https://github.com/lddubeau/saxes to work under Deno without needing the whole of JSDom.

MarkTiedemann commented 4 years ago

@kitsonk I have used jsdom/w3c-xmlserializer for working with XML in Deno. That's not a problem (though it does require custom build steps).

I think the first question is: Should DOMParser and XMLSerializer be part of Deno core or should those be userland modules?

The second question is: Should we implement them in JS or Rust?

Currently, I'm using JS userland modules. Ideally, I'd like to see this implemented in Rust in Deno core.

max-pub commented 4 years ago

I'm certainly no WASM expert, but wasm-functions can be called synchronously from JS, as far as I understood. I (maybe naively) assumed that compiling something like https://searchfox.org/mozilla-central/source/dom/base/DOMParser.cpp might fast-track inclusion of standard-compliant XML-handling into Deno. At least until a Rust-implementation is ready.

Secondly, I think including as many browser-APIs as possible into Deno core would massively alleviate development. (for example, a browser module working with XML-data cannot be used in Deno right now). Beeing able to use the same modules in front- and backend development would IMHO be a strong incentive to switch from node to deno.

kitsonk commented 4 years ago

Calls can be, but fetching and instantiating WebAssembly is inherently async.

WebAssembly isn't a magic solution to compiling code. WebAssembly lives and runs in a sandbox, a sandbox that has very limited interaction with the outside world, just like JavaScript. Most code that isn't written with targeting WebAssembly in mind simply won't work with a lot of work re-writing that code. The module you reference is far from a standalone module. It is effectively an expose of an API that is built on top of the whole of the DOM implementation for Firefox and the whole of the DOM implementation of Firefox expects a lot more coupling to the host than is what is available in WebAssembly.

What works well in WebAssembly is code that is designed for WebAssembly, that has discreet functionality, that expects to run in a sandbox.

Including browser APIs comes at a cost of maintaining that API. There are a lot of APIs out there too that are part of the browser spec that are pretty ugly, and so it makes a lot of sense to take a far more metered approach. Not just throw everything.

MarkTiedemann commented 4 years ago

I (maybe naively) assumed that compiling something like https://searchfox.org/mozilla-central/source/dom/base/DOMParser.cpp might fast-track inclusion of standard-compliant XML-handling into Deno. At least until a Rust-implementation is ready.

As mentioned earlier in this thread, there is a standard compliant solution in Rust by Mozilla: https://github.com/servo/html5ever

There are a lot of APIs out there too that are part of the browser spec that are pretty ugly, and so it makes a lot of sense to take a far more metered approach.

I'm not sure "ugly" is a good argument. I'd rather have Deno support ugly, old, known standards, such as, DOMParser, which, by the way, is literally available in Chrome since Chrome version 1, rather than the beautiful, latest, experimental, unstable APIs...

kitsonk commented 4 years ago

@MarkTiedemann I am not saying no. I was responding to "it is a web standard so Deno should have it" and that the intent is not to have every web standard as part of Deno.

At the moment, it isn't a priority for anyone. There servo implementation would go quite a long way, but it is a decent amount of work to make it work in JavaScript compliant with the APIs and ensure it is well tested. If someone were to work on it I am sure the contribution would be welcome.

MarkTiedemann commented 4 years ago

You are right, @kitsonk, this is indeed not a priority and a lot of work. After further consideration, I also think that this idea should probably be tested outside of Deno core in a userland Rust plugin first.

Closing for now. :)

PS: If anyone is interested on working on this, please tag me. I have little Rust experience so far, but I'd like to join.

timreichen commented 3 years ago

Maybe of some interest? https://deno.land/x/deno_dom

yacinehmito commented 3 years ago

@MarkTiedemann By closing, did you mean that you will work on it or that it isn't worth working on it? If neither, I think it is clearer to reopen it.

iugo commented 10 months ago

deno doc about DOM: https://docs.deno.com/runtime/manual/advanced/jsx_dom/overview