Open udoprog opened 9 months ago
I'd propose as a policy making these details private unless an error variant or field propagates communicates something which is publicly actionable
It's quite hard to tell though when something is actionable and when not. In the "worst case" somebody might want to count errors of a kind and then suddenly all internals are actionable since it's hard to keep statistics otherwise.
I agree that the large API surface can be problematic, but this is also part of the surface of wgpu_core
not wgpu
, so I'm a lot less concerned: most people are expected to interact with wgpu
whereas wgpu_core
is more of an implementation detail.
As for the other point of scattered documentation I think we should come up with a shared namespace for all exposed wgpu_core error types, that would improve things a lot without needing to shuffle around what's exposed.
I might have overestimated how leaky error types from wgpu-core is. If there's no way to access them publicly it should be fine!
One could still lock down the API across crates, because dead code detection can help prune unused error variants. But that's less of a concern.
I did a small experiment where I hid the error types in wgpu-core
and republished them through public wrappers hiding them like proposed above. This allows dead code elimination to kick in, so if they're not produced internally in wgpu-core they're not used. This has resulted in about 15 error variants which are no longer used.
Note that I haven't hidden everything yet, but this might still be worthwhile doing to improve coherency between project components.
It's nice to be able to get warnings about the dead code.
Off the top of my head, among the things that we need to be able to handle outside of wgpu-core:
That's not barring the door on the general proposal (and I love how this found all that dead code now!) but over here hit a usecase for fairly deep inspection of the errors: https://github.com/rerun-io/rerun/blob/main/crates/re_renderer/src/error_handling/wgpu_core_error.rs#L131 this code tries to do a custom error de-duplication scheme in order to run with a managable amount of flood of errors whenever shader compilation failed (typically due to runtime supplied shaders). I suppose there's a point to be made that shader errors are special anyways and this doesn't affect most of it!
So just pitching here one design I'd like to explore if there's interest, where error reporting and tracing is baked together into diagnostics. It's akin to something I call "abductive diagnostics" and I think wgpu can benefit from it as well.
The idea here would be to instead of propagating and wrapping hierarchies of Result
's, we'd pass along an error context with which we can perform basic tracing:
impl Device<A> {
pub(crate) fn create_bind_group(
self: &Arc<Self>,
cx: &mut Context<'_>, // <- this is a new argument every fallible function receives.
layout: &Arc<BindGroupLayout<A>>,
desc: &binding_model::BindGroupDescriptor,
hub: &Hub<A>,
) -> Result<binding_model::BindGroup<A>, Error> { // <- Error is an empty marker type which is constructed through the context.
/* .. */
}
}
Note the
ErrorMarker
is a type which can only be constructed through context, to ensure that it is consulted before returning from a method.
With this we can trace structures:
for entry in desc.entries.iter() {
let vertex_buffers_enter = cx.enter("vertex.buffers");
/* .. */
// do something fallible, every error which is actually captured by the context includes the semantic trace.
// Context::result is a convenience function which captures any error and transform the `Result<T, E>` into `Result<T, Error>` which includes the marker type.
let value = cx.result(do_something_fallible())?;
// more than one error can be reported at a time
if !is_valid1(entry) {
cx.report(Error::IsNotValid1);
}
if !is_valid2(entry) {
cx.report(Error::IsNotValid2);
}
cx.leave(vertex_buffers_enter);
}
Special conditions like OutOfMemory
would then be handled directly by Context::report
and tracked specifically so that it can be correctly propagated in handle_error
. Or if a party is interested in some specific error condition they can iterate over error causes which have been collected in the Context
.
The benefits would be the following:
Pretty printing errors could then be done like this consistently regardless of the kind of error raised, without having to implement it specifically as a trait:
Bind group entry with label "foo" in entries[1] is invalid:
- Binding count declared with at most 4 items, but 2 items were provided
- Buffer binding size 10 is less than minimum 4
So I've built a working prototype of what I suggested above in my a branch here. For a brief overview, it's similar to how mature compilers report errors since they tend to be quite rich in context. I think it would mitigate quite a few of the issues which relate to unhelpful error messages.
If we adopt this pattern we can:
vertex.buffers[0].array_stride
) for every error. This is tracked separately from the error hierarchy, so error variants don't have to include this.The pattern is mostly backwards compatible and can be incrementally implemented - the prototype only changes three functions for now and doesn't modify existing error hierarchies.
@cwfitzgerald When you're available, can I get some quick feedback before I put any mileage into it, since it would be a pretty significant change to wgpu-core?
I haven't been able to digest this the way I think is necessary to get traction on a discussion here, but I think I'm going to have bandwidth to discuss more details in the middle of next week. 🫡
Is your feature request related to a problem? Please describe.
Thoughts while looking over issues like #5066
The internals of every error is currently part of the public API surface of this project. I'd propose as a policy making these details private unless an error variant or field propagates communicates something which is publicly actionable (See
SurfaceError
example below).I believe this is better for at least two reasons:
#[non_exhaustive]
, this still does not permit variants or fields to be removed.Note that all errors should continue to support
std::fmt::Display
andstd::error::Error::source
since this allows for a user to print decently rich diagnostics and error chains:An example of an error which is publicly actionable is
SurfaceError::Outdated
, which signals that a surface needs to be reconfigured.Describe the solution you'd like
The pattern I usually deploy is something like this:
For e.g. SurfaceError, I'd probably convert the existing error variants to a simple:
Or, if there is public interest in more than one state:
Like with
io::Error::kind
, the publicly exportedSurfaceErrorKind
doesn't have to match the internal representation if a representation is available which is more efficient than the publicly exported type.