cognitive-engineering-lab / rust-book

The Rust Programming Language: Experimental Edition
https://rust-book.cs.brown.edu
Other
503 stars 82 forks source link

Rust is not a macro language for assembly #186

Open ia0 opened 2 months ago

ia0 commented 2 months ago

Thanks for working on educational materials! This looks pretty good but has in my opinion a rather problematic issue.

The What is Ownership? chapter seems to assume that Rust is a macro language[^1] for assembly (which is a common misconception in system programming, the origin of most bugs, and probably the downfall of C and C++):

This is a problem because it teaches that Rust is defined by its compilation artifacts (i.e. its implementation) rather than by its semantics (i.e. its specification). Over the long term, this will result in Rust not being able to change its implementation (without making existing silent bugs in user code apparent) because users depend on a particular implementation rather than the specification. Note that Hyrum's law states that eventually users will depend on a particular implementation. However, I believe that with proper tooling (like Miri) to enforce users to code against the Rust specification, we may avoid or reduce the impact of Hyrum's law. So we should also try to avoid it in teaching materials.

The suggested fix is rather simple (because the high-level teaching ideas are good):

[^1]: A macro language is a language defined by its compilation to target languages. This is in contrast to languages defined directly and independently of target languages, for example with an operational semantics at source level.

austin362667 commented 1 month ago

I second that.

I think it certainly needs more elaboration on semantic-level concepts that won't change too much over time.

willcrichton commented 6 days ago

I'm of a few minds about this take. On the one hand, I agree, and Ch4 is written as it is because I agree. I specifically teach the MIR-level semantics of Rust because it's the level at which the borrow checker (and Miri) think about Rust.

On the other hand, the borrow checker is only one part of Rust. Rust is, actually, a macro language for assembly. The vast majority of Rust programs will be compiled to x86 or ARM or Wasm or whatever, and executed as such. The effects of undefined behavior will ultimately appear within these settings.

Additionally, as a matter of pedagogy, most readers will not be familiar with enough programming language theory to understand the concepts or vocabulary of operational semantics. They will, however, likely understand the idea of assembly, and the idea that there's a "high-level semantics" of a Rust program and an "actual semantics" depending on the compilation target. I worry that the points you're asking to include are too nuanced / too advanced for relatively little benefit to the average reader.

ia0 commented 6 days ago

Thanks for the explanation!

Thinking about it, I was mostly worried about the target audiance writing unsafe Rust with such teaching material. But as it appears, this chapter is not meant for such audiance. I got confused because this chapter uses words like "safety", "unsafe", and "undefined behavior" which are usually used in the context of unsafe Rust. But this chapter does not have any unsafe Rust and doesn't promote usage of unsafe Rust either.

So my recommendation instead would be to use alternative wordings to avoid such confusion. Here are suggestions:

What do you think?

willcrichton commented 6 days ago

I do still want to use some of the relevant terminology. This chapter is based on the conscious decision that Rust learners can benefit from knowing something about undefined behavior even if they don't write a single line of unsafe code. (This idea is based on our human factors research: https://dl.acm.org/doi/10.1145/3622841)

ia0 commented 6 days ago

The usage of "unsafe" in that paper seems wrong too. The paper says:

Participants frequently struggled to construct a correct counterexample to an unsafe function. For example, consider the make_separator program, shown on the right, which returns a dangling pointer to the variable default.

fn make_separator(user_str: &str) -> &str {
    if user_str == "" {
         let default = "=".repeat(10);
         &default
    } else {
        user_str
    }
}

This function is not "unsafe". It is ill-typed. The paper seems to assume that something else was written:

fn make_separator(user_str: &str) -> &str {
    if user_str == "" {
        let default = "=".repeat(10);
        unsafe { std::mem::transmute(default.as_str()) }
    } else {
        user_str
    }
}

Or maybe it assumes the program is written in C with Rust syntax, which is not something well defined.

I agree that Rust learners would benefit from understanding memory, such that they can understand the error messages of the type (and borrow) checker. But I don't think that "unsafe" or "undefined behavior" are the concepts they lack. The concepts they lack are:

In C/C++, misunderstanding those concepts leads to undefined behavior. In Rust, misunderstanding those concepts leads to type error.