bazelbuild / rules_rust

Rust rules for Bazel
https://bazelbuild.github.io/rules_rust/
Apache License 2.0
667 stars 431 forks source link

rustdoc: offer web server as a Bazel binary #472

Open wchargin opened 4 years ago

wchargin commented 4 years ago

A rust_doc rule spits out a zip file, but that’s not convenient for actually consulting the docs. It would be great for rust_doc to provide a Bazel binary that can be run to spin up a web server.

Here is one approach. From the Bazel rule, provide this Python file:

import mimetypes
import os
import sys
from wsgiref import simple_server
import zipfile

def main():
    (webfiles_zip_name, default_crate, *args) = sys.argv[1:]
    port = 8000
    if args:
        port = int(args[0])

    webfiles = os.path.join(os.path.dirname(__file__), webfiles_zip_name)
    data = {}
    with open(webfiles, "rb") as fp:
        with zipfile.ZipFile(fp) as zp:
            for path in zp.namelist():
                data[path] = zp.read(path)
    sys.stderr.write("Read %d files from %s\n" % (len(data), webfiles_zip_name))

    default_path = "/%s/index.html" % default_crate

    def app(environ, start_response):
        p = environ.get("PATH_INFO", "/").lstrip("/")
        if not p:
            start_response("302 Found", [("Location", default_path)])
            yield b"302 Found\n"
            return
        if p.endswith("/"):
            p += "index.html"
        blob = data.get(p)
        if not blob:
            start_response("404 Not Found", [])
            yield b"404 Not Found\n"
            return
        (mime_type, encoding) = mimetypes.guess_type(p)
        headers = []
        headers.append(("Content-Type", mime_type))
        if encoding is not None:
            headers.append(("Content-Encoding", encoding))
        start_response("200 OK", headers)
        yield blob

    server = simple_server.make_server("", port, app)
    # Find which port was actually bound, in case user requested port 0.
    real_port = server.socket.getsockname()[1]
    msg = "Serving %s docs on port %d\n" % (default_crate, real_port)
    sys.stderr.write(msg)
    try:
        server.serve_forever()
    except KeyboardInterrupt:
        print()

if __name__ == "__main__":
    main()

Then generate a py_binary build target:

# Assuming `rust_doc(name = "mylib_doc", dep = ":mylib")`, generate:

py_binary(
    name = "mylib_doc_server",
    srcs = ["mylib_doc_server.py"],
    args = [
        "mylib_doc.zip",  # output of `rust_doc` rule
        "mylib",  # default crate
    ],
    data = [":mylib_doc.zip"],  # output of `rust_doc` rule, again
    python_version = "PY3",
    srcs_version = "PY3",
)

Then, users can bazel run :mylib_doc_server to start a server, and optionally pass a port argument. Or, use ibazel instead of bazel, and then whenever you change one of the Rust sources, the Rustdoc will automatically recompile and the server will automatically restart.

This depends on #471; without that fix, this will be harder.

Thoughts? Are you open to this change? Is it okay to use Python here? Note that the above simple server only uses the Python standard library. Happy to contribute a PR, and I license all this code as Apache 2.0.

UebelAndre commented 4 years ago

Looks great to me! But I gotta wonder, could the same thing be implemented in Rust? 😉

wchargin commented 4 years ago

Yep, that’s a reasonable question. I think the answer is, “yes, but not with only the standard library”. We need to read a web file and run a web server, both of which want external crates. And once you need external crates, it’s less clear to me whether that’s the right choice for rules_rust itself.

For instance: how would rules_rust depend on its zipfile and web server crates? The natural approach for end users is cargo-raze, but that feels uncomfortably circular to me. The generated BUILD file loads from @io_bazel_rules_rust, but from within rules_rust we’d want it to load from just “//”.

The runtime performance of the docs web server is not critical, so Python’s okay there. But many people might be frustrated by having to compile the whole Rust web stack and link a Rust web server (not super fast, in my experience) just to see their docs. We’d also have to go to some lengths to ensure that updating the docs doesn’t re-link the binary, since that would be quite annoying.

Since Bazel has native support for Python (native.py_binary), it seems reasonable to me to use Python here. What do you think?

UebelAndre commented 4 years ago

I think the compile time cost is a good call out. Do you know how cargo doc handles this currently? I thought it outputs static files for you to simply view in a browser. No server required. Perhaps we'd just need a running process to watch for changes on disk and output a file:// path to some html file?

Even though Bazel currently has native.py_binary, https://github.com/bazelbuild/bazel/issues/9006 indicates that the rules library should be used instead. So if this were to be implemented using python, it should be done in a future facing manner by using the python rules.

wchargin commented 4 years ago

Do you know how cargo doc handles this currently? I thought it outputs static files for you to simply view in a browser. No server required.

Right, cargo doc just sticks files in target/doc/. With the standard Cargo toolchain, I can just

(mkdir -p target/doc/ && cd target/doc/ && python3 -m http.server) \
    & cargo watch -x doc

or replace python3 -m http.server with your favorite static server. But with rust_doc this isn’t an option because the contents are all bundled up in a zip archive.

I guess it’s technically true that you can visit a file:/// URL, but I never do so. One reason is that I’m often SSHed into a workstation, which I can connect to if it runs a web server but not via a file URL.

Perhaps we'd just need a running process to watch for changes on disk and output a file:// path to some html file?

Not sure what you mean; please clarify?

There’s no file:/// URL that we can emit, because the index.html is inside a zip archive. I don’t think that we should have a running process that re-extracts the archive to disk every time that it changes.

Even though Bazel currently has native.py_binary, bazelbuild/bazel#9006 indicates that the rules library should be used instead.

Eh. That’s one interpretation. Another is that that flag was supposed to be flipped in Bazel 2.0, and still hasn’t flipped in Bazel 3.7.0, nearly a year later. :-) But yes, we could use rules_python and it’d be fine.

wchargin commented 4 years ago

could the same thing be implemented in Rust?

I’ve given this a shot:

Rust `zipserver` ### Cargo.toml ```toml [package] name = "zipserver" version = "0.1.0" authors = ["William Chargin "] edition = "2018" [dependencies] anyhow = "1.0.34" hyper = "0.13" mime_guess = "2.0.3" tokio = { version = "0.2", features = ["full"] } zip = "0.5.8" ``` ### src/main.rs ```rust use anyhow::Context; use hyper::{http, Body, Request, Response, Server}; use std::collections::HashMap; use std::convert::Infallible; use std::fs::File; use std::io::{BufReader, Read}; use std::net::SocketAddr; use std::path::{Path, PathBuf}; use zip::ZipArchive; struct State { data: HashMap>, root_redirect: String, } async fn hello(state: &'static State, req: Request) -> Result, http::Error> { let path = req.uri().path().strip_prefix("/").unwrap_or_default(); if path.is_empty() || path == "/" { return Ok(Response::builder() .status(302) .header("Location", &state.root_redirect) .body(Body::from(b"302 Found\n".as_ref())) .unwrap()); } let data = match state.data.get(path) { Some(v) => v, None => { return Ok(Response::builder() .status(404) .body(Body::from(b"404 Not Found\n".as_ref())) .unwrap()); } }; let ext = path.rsplit('.').next().unwrap_or(""); let mime = mime_guess::from_ext(ext) .first() .map(|x| x.to_string()) .unwrap_or(String::new()); let res = Response::builder() .header("Content-Type", mime) .body(Body::from(data.as_slice())) .unwrap(); Ok(res) } fn read_zip(path: &Path) -> anyhow::Result>> { let mut result = HashMap::new(); let mut zip = ZipArchive::new(BufReader::new(File::open(path)?))?; for i in 0..zip.len() { let mut file = zip .by_index(i) .with_context(|| format!("zip file entry {}", i))?; let name = file.name().to_string(); let mut buf = Vec::new(); file.read_to_end(&mut buf) .with_context(|| format!("reading zip entry {:?}", name))?; result.insert(name, buf); } Ok(result) } #[tokio::main] async fn main() -> anyhow::Result<()> { let mut args = std::env::args_os().skip(1).collect::>(); if args.len() != 2 && args.len() != 3 { eprintln!("usage: zipserver []"); std::process::exit(1); } let zip_path = PathBuf::from(std::mem::take(&mut args[0])); let root_redirect = args[1] .to_str() .expect("root redirect must be valid Unicode") .to_string(); let port: u16 = args .get(2) .map(|x| x.to_str().expect("port must be valid Unicode")) .unwrap_or("8000") .parse() .expect("port must be a valid u16"); let state = State { data: read_zip(&zip_path)?, root_redirect, }; eprintln!( "Read {} files from {}", state.data.len(), zip_path.display() ); let state = &*Box::leak(Box::new(state)); let addr = SocketAddr::from(([0, 0, 0, 0], port)); let make_service = hyper::service::make_service_fn(|_conn| async move { Ok::<_, Infallible>(hyper::service::service_fn(move |req| hello(state, req))) }); let server = Server::bind(&addr).serve(make_service); println!("Serving on port {}", server.local_addr().port()); server.await?; Ok(()) } ```

With the dependencies fetched but not compiled, this clean-builds in 23.2 seconds on my workstation, and incrementally builds in 3.4 seconds. And my workstation is fairly powerful (Xeon W-2135, 6 physical cores). This is notably slower than the Python compile time of 0 seconds. ;-)

wchargin commented 4 years ago

(Oh, and that pulls in 68 transitive dependencies.)

UebelAndre commented 4 years ago

Not sure what you mean; please clarify?

I was suggesting having something like ibazel watch for changes and automatically update the content you're looking at.

Eh. That’s one interpretation. Another is that that flag was supposed to be flipped in Bazel 2.0, and still hasn’t flipped in Bazel 3.7.0, nearly a year later. :-) But yes, we could use rules_python and it’d be fine.

I feel strongly that a rules repository should adhere to every incompatibility flag. If this is something that's not actually going to be flipped, then someone should ping that issue and ask for the flag to be removed.

With the dependencies fetched but not compiled, this clean-builds in 23.2 seconds on my workstation, and incrementally builds in 3.4 seconds. And my workstation is fairly powerful (Xeon W-2135, 6 physical cores). This is notably slower than the Python compile time of 0 seconds. ;-)

There's definitely no discussion to be had around the time difference of a compiled and non-compiled language. Compiling a web server was always going to come at some noticeable cost. I think the clear distinction here is you're looking for something that improves remove developer environments (ssh'ed into another machine) and not necessarily tying to address a gap in functionality between cargo doc and these rules.

I do feel you've shined light on the issue that viewing the output of rust_doc is cumbersome and am in favor of any efforts to improve it. As much as I dislike python, I think this approach looks good to me (so long as the script says small and the heavy lifting be done by rustdoc and the relevant Bazel rules). 😄

wchargin commented 4 years ago

I feel strongly that a rules repository should adhere to every incompatibility flag. If this is something that's not actually going to be flipped, then someone should ping that issue and ask for the flag to be removed.

Cool; that sounds pretty reasonable. Happy to comply.

Thanks for your quick responses and the helpful discussion! I’ll start with the dependent issue #471 and then move on to this one.