Safety redesign - Githubissues

godot-rust / gdnative

Rust bindings for Godot 3

https://godot-rust.github.io

MIT License

3.61k stars 210 forks source link

Safety redesign #808

Open Bromeon opened 2 years ago

Bromeon commented 2 years ago

godot-rust historically (v0.8) used unsafe for every interaction with the engine. While this was "on the safe side", it was very inconvenient and forced the user to clutter the code with unsafe. v0.9 took a different approach with type-states: the user opts in to unsafety when converting Ref to TRef, and most (but not all) subsequent calls are safe. This improved a lot, but unfortunately there's still quite a bit of friction (https://github.com/godot-rust/godot-rust/issues/758 among others).

The more we looked into safety models, the more it became apparent that there's quite a bit of work left to do to come up with something consistent and uniformly applicable. Most notable, the following questions remain unanswered:

What assertion does the user precisely make with Ref::assume_safe()? What about the safe ways to obtain TRef (owner parameter)? Object lifeness is not enough.
Since calls to GDScript can execute arbitrary code, should every such invocation be marked unsafe? Or should we draw the boundary at Rust code, leaving the responsibility for correct GDScript implementation to the user? For the record, these are possible causes for UB when calling GDScript:
- Destruction of objects in use, e.g. through Object::free() or Reference::unreference().
- Race conditions -- GDScript has full access to its own Threadclass
- Calls back to Rust via GDNative, i.e. execution of methods marked unsafe from safe code
- Bugs in the C++ code of the engine (not worth protecting against)
How can we ensure that unsafe doesn't turn into an inflationary keyword that's used mindlessly whenever any Godot APIs are used? There is no point in being on the "academically correct side" if people start using unsafe like it means nothing. Ideally, unsafe has the following purposes:
- to limit the UB to few parts of Rust code, which can be the focus of auditing
- to make developers re-think if they really need unsafe whenever they write it
- to make developers alert when they see it in existing code, and treat the surrounding logic with caution

A new model would ideally be consistent in the way that it's clear which methods are unsafe and which aren't, i.e. there should be a ruleset for inclusion/exclusion (with only few exceptions). It should also be pragmatic for above-mentioned reasons, a dogmatic "everything is unsafe" is not helping anyone. We have to embrace the fact that godot-rust's entire point is to deal with non-Rust code, whose safety is outside the library's control.

This issue hosts general discussion and this post serves as tracker for implementation tasks.

Large issues needing considerable design:

[ ] #635
[ ] #714
[ ] #758

More isolated, specific API shortcomings:

836
- 844
790
- 795
748
641

jacobsky commented 2 years ago

Thanks for creating this issue for discussion on the topic.

* What assertion does the user precisely make with `Ref::assume_safe()`? What about the safe ways to obtain `TRef` (owner parameter)? Object lifeness is not enough.

I think that Ref::assume_safe() also requiring unsafe is kind of redundant as the the point of the typestate is already there to force us to be sure that we trust the reference. While TRef stands for Temporary Ref, I often think of it as a Trusted Ref where I'm sure that I can use it in the way I need it.

* Since calls to GDScript can execute arbitrary code, should every such invocation be marked `unsafe`? Or should we draw the boundary at Rust code, leaving the responsibility for correct GDScript implementation to the user?

I think it's probably fine to leave it to the user. One note on the engine design is that the only GDScript executed is the GDScript that the user writes. If you want to mix and match, I think it's fine to make the various call() functions safe as the assumption is that the user has audited the function accordingly. For myself, it doesn't make a big difference as I generally don't use much GDScript, so the extra unsafe feels correct; on the otherhand, if you mix a and match a lot of GDScript, unsafe will start to feel more like noise.

* How can we ensure that `unsafe` doesn't turn into an inflationary keyword that's used mindlessly whenever _any_ Godot APIs are used? There is no point in being on the "academically correct side" if people start using `unsafe` like it means nothing. Ideally, `unsafe` has the following purposes:

  * to limit the UB to few parts of _Rust_ code, which can be the focus of auditing
  * to make developers re-think if they really need `unsafe` whenever they write it
  * to make developers alert when they see it in existing code, and treat the surrounding logic with caution

In addition to your thoughts above, I think that we can also consider unsafe to be any function that is "untrusted", for example: I don't think "Object::free" should ever be safe due to how it immediately releases the underlying object from memory, but there are a lot of functions that we can consider to be trusted.

A new model would ideally be consistent in the way that it's clear which methods are unsafe and which aren't, i.e. there should be a ruleset for inclusion/exclusion (with only few exceptions). It should also be pragmatic for above-mentioned reasons, a dogmatic "everything is unsafe" is not helping anyone. We have to embrace the fact that godot-rust's entire point is to deal with non-Rust code, whose safety is outside the library's control.

I agree 100%

All in all, I think that typestates are a great way to opt-in to safety from Rust. Even if we have the same functionality available in Ref and TRef being able to assume_safe allows the programmer to still get the benefit of knowing that "hey, I'm going to start using the FFI here and it's my responsibility as long as I use this TRef that the FFI isn't going to cause UB".

I'm not quite sure where to draw the line, but I do really like the current type-state approach. Perhaps we can leverage it and make it a bit easier to approach without needing to pepper unsafe everywhere.

Bromeon commented 2 years ago

For reference, here is a (likely incomplete) list of APIs that allow custom code execution. GDScript, being a dynamic language, offers a lot of reflection mechanisms.

Besides soundness implications (elaborated in first post), such methods can also be a security risk if they are passed unchecked user input -- especially those towards the end of the list.

Object
- set(), get()
- call(), callv(), call_deferred()
- emit_signal()
Variant
- call()
- evaluate() (?)
PackedScene
- instance()
SceneTree
- call_group(), call_group_flags()
- change_scene(), change_scene_to()
TreeItem
- call_recursive()
Node
- propagate_call()
Tween
- follow_*(), interpolate_*(), targeting_*()
AnimationPlayer
- play(), play_backwards() (?)
FuncRef
- call_func(), call_funcv()
Script
- set_source_code()
Thread
- start()
Expression (arbitrary code execution)
- execute()
OS (arbitrary process execution)
- execute()
GDNative (low-level GDNative access)
- call_native() -- "raw function calls"

And this list doesn't even mention networking (rset() and all that), node notifications (e.g. make _ready() execute again), or custom signal triggers (like timers, "changed" events, etc).

parasyte commented 2 years ago

Not sure if this belongs here or in one of the related tickets, but I have been trying to get a better grasp on the safety requirements for assume_safe(). It seems to me that the unbounded lifetimes between &'r self and TRef<'a, T> is a soundness hole. For instance, it is trivial to extend the lifetime of the returned TRef to 'static.

I did some digital archeology looking for answers and the lifetime split was done to fix #455 in #456. Constraining the lifetime in https://github.com/godot-rust/godot-rust/blob/5f388837a6dc91c7ae9f4abfe3de2a2880e39243/gdnative-core/src/object/bounds.rs#L212 with impl<'a, 'r: 'a> introduces some compiler errors (which AFAICT are correctly identified as erroneous):

error[E0515]: cannot return value referencing temporary value
  --> gdnative-bindings\src\utils.rs:23:5
   |
23 |        Engine::godot_singleton()
   |   _____^
   |  |_____|
   | ||
24 | ||         .get_main_loop()?
25 | ||         .assume_safe()
26 | ||         .cast::<SceneTree>()?
27 | ||         .root()?
28 | ||         .assume_safe()
29 | ||         .get_node(name)?
   | ||________________________- temporary value created here
30 | |          .assume_safe()
31 | |          .cast::<T>()
   | |_____________________^ returns a value referencing data owned by the current function

error[E0515]: cannot return value referencing temporary value
  --> gdnative-bindings\src\utils.rs:75:9
   |
75 |         self.upcast().get_node(path)?.assume_safe().cast()
   |         -----------------------------^^^^^^^^^^^^^^^^^^^^^
   |         |
   |         returns a value referencing data owned by the current function
   |         temporary value created here

For more information about this error, try `rustc --explain E0515`.

This code was added in #669, about 6 months after the lifetime split.

I also found this particularly unusual: https://github.com/godot-rust/godot-rust/blob/5f388837a6dc91c7ae9f4abfe3de2a2880e39243/gdnative-core/src/object/bounds.rs#L213 I would expect these lifetimes to be unrelated, since the object is reference counted within the engine.

KennethWilke commented 2 years ago

I was learning more about the reference systems and their safety dynamics a bit with the help of folks on discord today. ❤️

Jacobsky's description of TRef as a trusted reference helped me start to better understand TRef vs Ref. I had a few layers of misconceptions in my mental model around these. I hadn't spent enough time before to understand these types and was more focused on having fun using Rust instead of GDScript.

Through my tinkering I felt like TRef was the primary useful structure and Ref was basically just a way to keep a pointer around longer. The trusted reference description helped me better understand how my code was working, but I also didn't understand the ownership aspects of Ref.

With only Ref<T, Shared> in mind, I started to think it was kind of a sidekick. Sure, it's safe according to the borrow checker but for me it's not useful for anything until I unsafe it into something to do Godot things with. This had me thinking they should be renamed, that maybe TRef should be GDRef and Ref should be GDUnsafeRef. Waridley found that to be backwards so I knew I was still off on something.

Now if I have things right, I can see how Ref can be used without unsafe in certain conditions, such as if it's Unique. The Obtaining a Safe View section of the docs helped here too

I share this hoping it helps provide insight on how others might be looking at things and struggling. I also wanted to share the idea of restructuring and renaming the references a little bit.

The idea is rough but I feel like Ref<T, Shared> and Ref<T, Unique> should not be the same structure. If I can directly deref one but need unsafe/assume_same the other first it feels like a mixed API. I look at the nuances in the conversions and thinking types like UniqueRef<T> or SharedRef<T> may work better, with From/Into traits for safe conversions and maybe macros for unsafe conversations.

I'm not sure how TRef and other aspects like reference counting fits with this idea, but I think this kind of expanded clarity in the smart pointer naming and APIs could help alleviate confusion

godot-rust / gdnative

Safety redesign #808

836

844

790

795

748

641