benschwarz / developers.whatwg.org

Used to create the contents of developers.whatwg.org
http://developers.whatwg.org/
198 stars 39 forks source link

URL in the html5-full.html Make target 404s #93

Closed strugee closed 7 years ago

strugee commented 9 years ago

I finally have some bandwidth to work on #90, so I've been poking around the repo a bit. Seems like the URL in the html5-full.html Make target redirects to https://html.spec.whatwg.org/dev-index, which 404s.

benschwarz commented 9 years ago

I think we should aim to build from the full spec. @hixie, where can we find that?

strugee commented 9 years ago

When you say full spec, what do you mean? The full source spec? Or the full built spec? (Didn't @hixie say that the built spec doesn't include the developer annotations?)

benschwarz commented 9 years ago

I mean the “fullest” version of the spec that exists before any transformations. This project basically reorganises it, adds navigation, basic search and hides java implementation… I figure that easy enough to do if we have the “full” spec.

Hixie commented 9 years ago

The source is https://html.spec.whatwg.org/source . I generate https://html.spec.whatwg.org/index in (more or less) one step from that. Right now that step also generates https://html.spec.whatwg.org/.wattsi-output/multipage-dev which has the non-dev stuff stripped out and is multipage. I'm happy to change that output in whatever way would be useful.

strugee commented 9 years ago

@Hixie what exactly happens to the source to turn it into HTML? Alternately, do you have a copy of the script you use to build? (I poked around on WHATWG SVN, but didn't find anything.)

Hixie commented 9 years ago

It's a bunch of ObjectPascal and some Perl scripts.

strugee commented 9 years ago

@Hixie right, I'm wondering exactly what needs to happen to the source to make it valid HTML, e.g. "strip out [w-nodev]".

Hixie commented 9 years ago

Oh it's a bunch of stuff (that's why I was suggesting I'd just extend it to do what you wanted). I can walk through it with you tomorrow if you like, ping me on IRC?

strugee commented 9 years ago

Okay, that makes sense. I was ignoring your suggestion to extend the existing tooling since it seems like the Right Thing™ to do a clean rewrite in Node, which I don't mind spending the effort to do. (Unless you disagree?)

IRC would be fantastic, but I'll be on a plane tomorrow, so it may have to wait until Friday, depending on landing times, etc. What timezone are you in, so I don't start badgering you at midnight?

Hixie commented 9 years ago

I'm usually online from about 17:00 UTC to about 01:00 UTC.

Doing a clean rewrite in Node is fine if that's what you want to do. Is there a modern HTML parser in Node that you can adapt a bit and that has decent performance on large inputs? The source file isn't in HTML, it's in a variant of HTML with some new elements and attributes, and is multiple megabytes. (I wrote the HTML parser for my version from scratch in a compiled language so it could be fast, and gave it the relevant hooks to make it possible to support the language extensions the source file uses.)

benschwarz commented 9 years ago

Static publication is probably still best... We could host on GitHub pages too

strugee commented 9 years ago

@Hixie how serious are the transformations? Do you actually have to parse the entire file into a DOM tree-like structure, or can you parse and transform individual parts?

The reason I ask is because I want to know if you could work with it as a stream or if you have to buffer and parse the entire thing all at once. If you can do it streaming, that would neatly solve the speed problem.

Also, +1 for GitHub Pages. (gulp-ghpages, anyone?)

domenic commented 7 years ago

Good news! We've revamped the developer's edition and now it syncs automatically with the source spec. It's at a new URL, https://html.spec.whatwg.org/dev/. (We are working to set up a redirect.)

As such, I've removed all the scripts from this repo, as it's now entirely integrated into our build process. Unfortunately our build is not in Node, but at least now it's maintained by the same people who maintain the main HTML spec, so it should always be up to date.

So, since this issue is no longer a concern, let me close it.