DoumanAsh / xxhash-rust

Rust implementation of xxhash
Boost Software License 1.0
205 stars 20 forks source link

Unsafe functions are not declared `unsafe` #29

Open domenukk opened 1 year ago

domenukk commented 1 year ago

Throughout the codebase a number of functions are not declared unsafe even though they dereference raw pointers passed in as parameters. All of the functions appear to be non-pub, but it still makes it hard to argue about the libraries safety.

DoumanAsh commented 1 year ago

These functions are inherently unsafe regardless if you declare them unsafe or not. Because I write using raw pointers which are much more easier to handle than going through tons of overhead in form of runtime checks

As a side note codebase is verified on all possible code paths using valgrind to validate that no invalid read or write is performed so library is 100% safe. Do you have concrete concerns?

P.s. note that all these functions do is read/write pointers within correct bounds

domenukk commented 1 year ago

I'm not at all saying that your code isn't correct. It's merely about the semantics of rust: a function that isn't marked unsafe should be safe for any given parameter. Hence, these functions should be marked unsafe, even if they are just internal (since future contributors can not know about the safety guarantees of these functions)

See https://stackoverflow.com/a/31628759 for example, maybe this explanation is better than mine.

Marking unsafe functions unsafe means that all callers need to use an unsafe {} block, and we can check at this point that all calls pass in a correct & valid pointer.

Hope that makes sense :)

DoumanAsh commented 1 year ago

Yes I know about semantics What I mean it is just not very ergonomic for me to slap unsafe {} for every correct invocation of memcpy hence I'm being lazy about it

P.s. to clarify I do not distinguish unsafe from safe code so for me it is all the same. If someone wants to contribute, valgrind will catch their error otherwise my review will catch

xpe commented 1 year ago

First, anyone who puts in the time and effort to open source code has done a very helpful thing. The same applies to contributing to others' open source projects. So, thank you both.

The conversation above seems a bit tense. I'd like to try to bring it back to what might be the fundamental point of contention. So, I put forward this simple claim:

Just because code compiles without the unsafe keyword does not make it semantically correct to omit unsafe.

I offer this claim as a key starting point, in the hopes that this discussion can continue productively.

xpe commented 1 year ago

People arrive at Rust with many different backgrounds and experience levels. The current tagline is a nice way of unifying all this variation:

A language empowering everyone to build reliable and efficient software.

Next, I want to dig into the reliable part..

Rust’s rich type system and ownership model guarantee memory-safety and thread-safety — enabling you to eliminate many classes of bugs at compile-time.

How do we get (and keep) these nice things? One part is the compiler, since it enforces a lot. But we shouldn't forget another key part: shared values and norms. Some developers may not like talk of values and norms, but they are real and important. They set expectations. In the context of this particular conversation, I think the following is the key expectation:

No matter what, Safe Rust can’t cause Undefined Behavior. This is referred to as soundness: a well-typed program actually has the desired properties. The Nomicon has a more detailed explanation on the subject. ~ https://doc.rust-lang.org/std/keyword.unsafe.html

I want to really dig into the "no matter what" language. What does that mean? Like I said above, it isn't only about the compiler. It is about what people do. Expectations help. Writing "no matter what" emphasizes the value the community places on soundness and memory safety. The aim is to cultivate a shared understanding that well-typed programs should function as expected, thereby reducing bugs and vulnerabilities. Why do we care? Because it helps the core purpose of Rust (same as above): "A language empowering everyone to build reliable and efficient software."

The soundness link leads to this text:

Soundness (of code / of a library): Soundness is a type system concept (actually originating from the study of logics) and means that the type system is "correct" in the sense that well-typed programs actually have the desired properties. For Rust, this means well-typed programs cannot cause Undefined Behavior. This promise only extends to safe code however; for unsafe code, it is up to the programmer to uphold this contract.

Accordingly, we say that a library (or an individual function) is sound if it is impossible for safe code to cause Undefined Behavior using its public API. Conversely, the library/function is unsound if safe code can cause Undefined Behavior.

The "undefined behavior" link leads to:

Undefined Behavior: Undefined Behavior is a concept of the contract between the Rust programmer and the compiler: The programmer promises that the code exhibits no undefined behavior. In return, the compiler promises to compile the code in a way that the final program does on the real hardware what the source program does according to the Rust Abstract Machine. If it turns out the program does have undefined behavior, the contract is void, and the program produced by the compiler is essentially garbage (in particular, it is not bound by any specification; the program does not even have to be well-formed executable code).

In Rust, the Nomicon and the Reference both have a list of behavior that the language considers undefined. Rust promises that safe code cannot cause Undefined Behavior---the compiler and authors of unsafe code takes the burden of this contract on themselves. For unsafe code, however, the burden is still on the programmer.

I'm sorry if this sounds a bit like a lecture! I do have a problem being long-winded. I can't tell how much of what I wrote above is being disputed here. My goal is to figure that out.

My hope is that I've grounded the conversation in non-controversial fundamentals. The way they get interpreted might be in question. Let's talk about it.

DoumanAsh commented 1 year ago

@xpe I don't know what you're trying to convey here as this library is verified using valgrind to confirm there is no invalid read or write within code. The original author's issue is with semantics of how I use unsafe since I only care about UX than formality.

So there is no question of soundness in this code, but rather only question of using cumbersome semantics at my expense for no benefit (aside from making potential error with hidden unsafety of function not marked unsafe) That is why I use valgrind to verify all my read/writes and if MIRI wouldn't be so broken as tripping over calling FFI function I would add it.

Maybe one day I'll make separate test suite for MIRI since it is so broken and unusable, but it wouldn't change much and soundness is proved already making issue itself no issue to me personally.

xpe commented 1 year ago

I don't know what you're trying to convey here

@domenukk To speak plainly, I said what I wanted to say. Here's a summary: The early conversation seemed a bit heated; it had many of the signs of miscommunication. That's why I went back to the foundational concepts and values.

If someone wants to know my take on all of the technical questions above, I don't have a comprehensive set of answers right now. I'm still assessing.

In a complex conversation like this, laying out the fundamentals tends to help. It allows people to ground their claims against common points of agreement. As such, I'm glad that your response touched on the notion of soundness.

So, let's get into soundness of this library:

this library is verified using valgrind to confirm there is no invalid read or write within code

there is no question of soundness in this code

Valgrind verification is valuable in many ways, but generally speaking, it does not prove soundness as understood in the Rust community.

I'd like to add this discussion to our shared consciousness: https://www.reddit.com/r/rust/comments/s929mz/being_fair_about_memory_safety_and_performance/

xpe commented 1 year ago

@domenukk Returning to your top-level comment:

Throughout the codebase a number of functions are not declared unsafe even though they dereference raw pointers passed in as parameters. All of the functions appear to be non-pub, but it still makes it hard to argue about the libraries safety.

For the scope of this comment, I'd like to put aside the question of "who has to put in the effort to make such changes?". I'd like to both (a) drive this to conceptual bedrock and (b) understand how various Rust projects handle this concern.

@domenukk To summarize, your comment says the lack of unsafe in some places makes it hard to reason about the library's safety. Some questions:

DoumanAsh commented 1 year ago

Valgrind verification is valuable in many ways, but generally speaking, it does not prove soundness as understood in the Rust community.

It proves that all unsafe code perform correct operations (not to mention it allows to run real code rather than sub-set of code that MIRI barely supports)

I run MIRI on my code and the only thing it found is dubious violation of stackful borrows and its rules are not even well defined to begin with. But there is not a single invalid read/write within code even under MIRI

xpe commented 1 year ago

@DoumanAsh It feels like we're not on the same page with the terms and concepts here.

[Valgrind] proves that all unsafe code perform correct operations

Valgrind is a dynamic analysis tool; as such it can not prove. It can detect particular undesirable behaviors if they occur during a particular execution (code + data + architecture). Valgrind analysis (alone) does not ensure soundness. Valgrind can't capture all types of Rust-specific Undefined Behavior (UB). See https://doc.rust-lang.org/reference/behavior-considered-undefined.html

Correction: Earlier in this conversation, I should not have used the phrase "Valgrind verification". It doesn't do formal verification nor static analysis. A better phrase would be "Valgrind dynamic analysis".

xpe commented 1 year ago

I'm not sure that we're all on the same page w.r.t. soundness versus correctness.

To decide when/where to use unsafe, we use soundness as the criteria. Not correctness.

DoumanAsh commented 1 year ago

I have 100% test coverage and even if valgrind is dynamic tool, in my case it proves correctness 100% And I also run MIRI separately to check soundness and only found non-issue with stacked borrows so code is 100% sound too

xpe commented 1 year ago

I have 100% test coverage and even if valgrind is dynamic tool, in my case it proves correctness 100%

No, 100% test coverage (as measured over lines of code) does not prove correctness.

I'm not questioning this library at this point, but your reasoning is incorrect.

eirnym commented 9 months ago

@xpe I understand all sides. As I see the library contents it's hard to say where it's incorrect.

The way it was implemented is easy way to read proper amount of chunks of four ints with type u32 or u64 (depending on bit amount) without unsafe casting as had been implemented.

If you know how to implement these functions in better way to increase soundness, please propose a PR.

eirnym commented 9 months ago

I don't know if it's known in Rust community, Zen of Python is applicable to almost anything, and one of lines is Practicality beats purity. This line is a strong enough argument for me to be at the side of @DoumanAsh with the existing implementation which is not "pure" in terms of unsafe. In other words I don't see any practical implications of the increasing purity in this particular case even I'll ever maintain this library