Closed strugee closed 7 years ago
I think we should aim to build from the full spec. @hixie, where can we find that?
When you say full spec, what do you mean? The full source spec? Or the full built spec? (Didn't @hixie say that the built spec doesn't include the developer annotations?)
I mean the “fullest” version of the spec that exists before any transformations. This project basically reorganises it, adds navigation, basic search and hides java implementation… I figure that easy enough to do if we have the “full” spec.
The source is https://html.spec.whatwg.org/source . I generate https://html.spec.whatwg.org/index in (more or less) one step from that. Right now that step also generates https://html.spec.whatwg.org/.wattsi-output/multipage-dev which has the non-dev stuff stripped out and is multipage. I'm happy to change that output in whatever way would be useful.
@Hixie what exactly happens to the source to turn it into HTML? Alternately, do you have a copy of the script you use to build? (I poked around on WHATWG SVN, but didn't find anything.)
It's a bunch of ObjectPascal and some Perl scripts.
@Hixie right, I'm wondering exactly what needs to happen to the source to make it valid HTML, e.g. "strip out [w-nodev]".
Oh it's a bunch of stuff (that's why I was suggesting I'd just extend it to do what you wanted). I can walk through it with you tomorrow if you like, ping me on IRC?
Okay, that makes sense. I was ignoring your suggestion to extend the existing tooling since it seems like the Right Thing™ to do a clean rewrite in Node, which I don't mind spending the effort to do. (Unless you disagree?)
IRC would be fantastic, but I'll be on a plane tomorrow, so it may have to wait until Friday, depending on landing times, etc. What timezone are you in, so I don't start badgering you at midnight?
I'm usually online from about 17:00 UTC to about 01:00 UTC.
Doing a clean rewrite in Node is fine if that's what you want to do. Is there a modern HTML parser in Node that you can adapt a bit and that has decent performance on large inputs? The source file isn't in HTML, it's in a variant of HTML with some new elements and attributes, and is multiple megabytes. (I wrote the HTML parser for my version from scratch in a compiled language so it could be fast, and gave it the relevant hooks to make it possible to support the language extensions the source file uses.)
Static publication is probably still best... We could host on GitHub pages too
@Hixie how serious are the transformations? Do you actually have to parse the entire file into a DOM tree-like structure, or can you parse and transform individual parts?
The reason I ask is because I want to know if you could work with it as a stream or if you have to buffer and parse the entire thing all at once. If you can do it streaming, that would neatly solve the speed problem.
Also, +1 for GitHub Pages. (gulp-ghpages
, anyone?)
Good news! We've revamped the developer's edition and now it syncs automatically with the source spec. It's at a new URL, https://html.spec.whatwg.org/dev/. (We are working to set up a redirect.)
As such, I've removed all the scripts from this repo, as it's now entirely integrated into our build process. Unfortunately our build is not in Node, but at least now it's maintained by the same people who maintain the main HTML spec, so it should always be up to date.
So, since this issue is no longer a concern, let me close it.
I finally have some bandwidth to work on #90, so I've been poking around the repo a bit. Seems like the URL in the
html5-full.html
Make target redirects to https://html.spec.whatwg.org/dev-index, which 404s.