Error handling and panic in observablehq notebook

jmatsushita commented 7 months ago

Hi,

Sorry for opening the issue directly on github, but I cannot see this repo in the dropdown menu in https://antv-issue-helper.surge.sh

I created a reproducer in ObservableHQ for a panic when using a specific shader: https://observablehq.com/d/bece57b91ebd5a40

I'm using a compute-toys notebook fork which pulls in the latest version @antv/g-device-api@1.6.4

The setup works for the origami shader, however when using shader code that compiles and runs fine in compute.toys (see https://compute.toys/view/1079) I can get the following stack trace in the console:

glsl_wgsl_compiler_bg.wasm:0x23257e Uncaught (in promise) RuntimeError: unreachable
    at __rust_start_panic (glsl_wgsl_compiler_bg.wasm:0x23257e)
    at rust_panic (glsl_wgsl_compiler_bg.wasm:0x2318d1)
    at std::panicking::rust_panic_with_hook::hdb857edf4e61fe11 (glsl_wgsl_compiler_bg.wasm:0x216e3e)
    at std::panicking::begin_panic_handler::{{closure}}::hef75e0b11b6899ae (glsl_wgsl_compiler_bg.wasm:0x21dd09)
    at std::sys_common::backtrace::__rust_end_short_backtrace::h2c4c1ce3acc2944e (glsl_wgsl_compiler_bg.wasm:0x2321dc)
    at rust_begin_unwind (glsl_wgsl_compiler_bg.wasm:0x228274)
    at core::panicking::panic_fmt::hf20f9922c43dc05c (glsl_wgsl_compiler_bg.wasm:0x22ca20)
    at core::result::unwrap_failed::h92d50af5668640d2 (glsl_wgsl_compiler_bg.wasm:0x21f486)
    at wgslcomposer_wgsl_compile (glsl_wgsl_compiler_bg.wasm:0x180ba4)
    at Rr.wgsl_compile (glsl_wgsl_compiler.js:211:12)
$__rust_start_panic    @    glsl_wgsl_compiler_bg.wasm:0x23257e
$rust_panic    @    glsl_wgsl_compiler_bg.wasm:0x2318d1
$std::panicking::rust_panic_with_hook::hdb857edf4e61fe11    @    glsl_wgsl_compiler_bg.wasm:0x216e3e
$std::panicking::begin_panic_handler::{{closure}}::hef75e0b11b6899ae    @    glsl_wgsl_compiler_bg.wasm:0x21dd09
$std::sys_common::backtrace::__rust_end_short_backtrace::h2c4c1ce3acc2944e    @    glsl_wgsl_compiler_bg.wasm:0x2321dc
$rust_begin_unwind    @    glsl_wgsl_compiler_bg.wasm:0x228274
$core::panicking::panic_fmt::hf20f9922c43dc05c    @    glsl_wgsl_compiler_bg.wasm:0x22ca20
$core::result::unwrap_failed::h92d50af5668640d2    @    glsl_wgsl_compiler_bg.wasm:0x21f486
$wgslcomposer_wgsl_compile    @    glsl_wgsl_compiler_bg.wasm:0x180ba4
wgsl_compile    @    glsl_wgsl_compiler.js:211
(anonymous)    @    compute-toys.js?v=4&…7b91ebd5a40@314:485
(anonymous)    @    compute-toys.js?v=4&…7b91ebd5a40@314:478
eval    @    observablehq-17:43
await in eval (async)        
eval    @    observablehq-295:4
(anonymous)    @    worker-488e6463.js:2
Promise.then (async)        
ea    @    worker-488e6463.js:2
value    @    worker-488e6463.js:2
(anonymous)    @    worker-488e6463.js:2
Promise.then (async)        
value    @    worker-488e6463.js:2
value    @    worker-488e6463.js:2
Pr    @    worker-488e6463.js:2
value    @    worker-488e6463.js:2
value    @    worker-488e6463.js:2
define    @    compute-toys.js?v=4&…7b91ebd5a40@314:698
value    @    worker-488e6463.js:2
(anonymous)    @    worker-488e6463.js:2
Promise.then (async)        
Ea    @    worker-488e6463.js:2
define    @    worker-488e6463.js:2
rs    @    worker-488e6463.js:2
(anonymous)

Additional context

Chrome on MacOS Version 122.0.6261.94 (Official Build) (arm64)

I have noticed the following in regard to errors and error handling in general:

[ ] Since I've made the run function take in the shader string, there are cases where in particular reusing the prelude without binding for instance time.elapsed will result in these errors being constantly spammed in the console.

GPUValidationError {message: '[Invalid CommandBuffer] is invalid.\n - While calling [Queue].Submit([[Invalid CommandBuffer]])\n'}
GPUValidationError {message: '[Invalid BindGroup] is invalid.\n - While encoding …r].SetBindGroup(0, [Invalid BindGroup], 0, ...).\n'}

Using a trick such as:

var ignore = time.elapsed / time.elapsed; // to avoid rewriting run without the time bindings.
var uv = (fragCoord * 2. - resolution.xy) / resolution.y * ignore;

Works to bypass that problem and avoid having to reinitialise each cell with a different set of bindings.

[ ] Is there a better way to enable using a notebook to progressively build a compute shader? Maybe some variables can be initialised and made available to the page first, like device and only running createProgram() and device.createComputePipeline() on each cell? If you had a example notebook with this approach that would be great to be able to use observablehq to explain shaders step by step.
[ ] Some code that runs on compute.toys doesn't work with g-device-api. This seems to have to do with type checking.
```
let a = hash1(p+vec2(0,0)); 
```
compute.toys seems able to infer the types but this will spam this error message with g-device-api:
```
Device.ts:210 
GPUValidationError {message: '[Invalid CommandBuffer] is invalid.\n - While calling [Queue].Submit([[Invalid CommandBuffer]])\n'}
```

The fix is easy enough, but the error handling and lack of error messages makes it difficult to use.

let a = hash1(p+float2(0,0)); 
// or let a = hash1(p+vec2(0.,0.));

[ ] In general, when there is an error that spams the console, even when the code is fixed in the same session/notebook, the messages continue to appear in the console (I'm not sure if there's still a context behind the scenes that is not cleaned up, or if the messages have been buffered). It requires reloading the notebook for the messages to stop.

Thank you very much for the great library, I'm really looking forward to using it more in observablehq and I hope my remarks are helpful to improve the error handling experience!

Cheers,

Jun

xiaoiver commented 6 months ago

Just try to change const to let:

// before
const tmax = 2000.0;

// after
let tmax = 2000.0;

Or declare it in module scope:

const tmax = 2000.0;

fn main_image() {}

I will check if it's a bug relative to naga, since according to WGSL Spec, a const-declaration can be declared in function scope: https://www.w3.org/TR/WGSL/#const-decls

jmatsushita commented 6 months ago

Thank you. Indeed the change you suggested fixes the error:

// after
let tmax = 2000.0;

However, should this be a panic (sometimes just swallowed or displaying RuntimeError: unreachable) would it be possible to have a more granular error message to help troubleshoot the shader next time?

xiaoiver commented 6 months ago

You're right, I should throw the compiling error message from naga.

xiaoiver commented 6 months ago

I update both naga & naga-oil to the latest version and throw error message from Rust side.

// before
naga = { version = "0.14.1", features = ["glsl-in", "wgsl-in", "wgsl-out"] }
naga_oil = "0.11.0"

// after
naga = { version = "0.19.2", features = ["glsl-in", "wgsl-in", "wgsl-out"] }
naga_oil = "0.13.0"

Now the error message can be shown correctly. For example, in the tmax case, naga-oil will complain the following message:

make_naga_module Composer error: expected assignment or increment/decrement, found 'tmax'

But we cannot use alias for now: https://github.com/bevyengine/naga_oil/issues/79 If we try to prepend alias int = i32; to our shader chunk:

Composable module identifiers must not require substitution according to naga writeback rules: `int`

So we have to do some alias work when borrowing shaders from compute-toys. Here is the complete alias map used in compute-toys:

alias int = i32;
alias uint = u32;
alias float = f32;
alias int2 = vec2<i32>;
alias int3 = vec3<i32>;
alias int4 = vec4<i32>;
alias uint2 = vec2<u32>;
alias uint3 = vec3<u32>;
alias uint4 = vec4<u32>;
alias float2 = vec2<f32>;
alias float3 = vec3<f32>;
alias float4 = vec4<f32>;
alias bool2 = vec2<bool>;
alias bool3 = vec3<bool>;
alias bool4 = vec4<bool>;
alias float2x2 = mat2x2<f32>;
alias float2x3 = mat2x3<f32>;
alias float2x4 = mat2x4<f32>;
alias float3x2 = mat3x2<f32>;
alias float3x3 = mat3x3<f32>;
alias float3x4 = mat3x4<f32>;
alias float4x2 = mat4x2<f32>;
alias float4x3 = mat4x3<f32>;
alias float4x4 = mat4x4<f32>;

Or use predeclares instead, eg.

vec2f https://www.w3.org/TR/WGSL/#vec2f
mat3x3f https://www.w3.org/TR/WGSL/#mat3x3f

xiaoiver commented 6 months ago

Just use the latest @antv/g-device-api@1.6.8 & WASM:

import { WebGPUDeviceContribution } from '@antv/g-device-api';

const deviceContribution = new WebGPUDeviceContribution({
  shaderCompilerPath: '/glsl_wgsl_compiler_bg.wasm',
  // From CDN
  // shaderCompilerPath: 'https://unpkg.com/@antv/g-device-api@1.6.8/rust/pkg/glsl_wgsl_compiler_bg.wasm',
});

Calls begin/endFrame() at the beginning and end of computePass：

device.beginFrame();

const computePass = device.createComputePass();
computePass.setPipeline(computePipeline);
computePass.setBindings(bindings);
computePass.dispatchWorkgroups(1);
device.submitPass(computePass);

device.endFrame();

antvis / g-device-api

Error handling and panic in observablehq notebook #165

Additional context