Scthe / frostbitten-hair-webgpu

Software rasterizing hair strands with analytical AA and OIT. Inspired by Frostbite's hair system: "Every Strand Counts: Physics and Rendering Behind Frostbite's Hair".
MIT License
44 stars 2 forks source link

Frostbitten hair WebGPU (Demo)

Software rasterizing hair strands with analytical anti-aliasing and order-independent transparency. Inspired by Frostbite's hair system: "Every Strand Counts: Physics and Rendering Behind Frostbite's Hair" as presented by Robin Taillandier and Jon Valdes.

Interactive Demo. WebGPU is only available in Chrome! Use the [W, S, A, D] keys to move and [Z, SPACEBAR] to fly up or down. [Shift] to move faster. Check src/constants.ts for full documentation.

If you are curious about the basics for rendering hair using compute shaders, I've written an article: "Software rasterizing hair". It's a good introduction if you've never worked in this domain.

https://github.com/user-attachments/assets/02859b92-a940-42b6-8381-dcac4b81b4d4

Nice hair rendering? Yes! Interactive physics? Yes! But is it fun? YES!

frostbitten-hair-static-img

A static image of software-rasterized hair. You can zoom in and inspect the anti-aliasing and order-independent transparency techniques.

Features

Features: Rendering

Features: Physics simulation

Check src/constants.ts for full documentation.

Features: Other

Compared to Frostbite's implementation

I'm using Robin Taillandier and Jon Valdes's presentation "Every Strand Counts: Physics and Rendering Behind Frostbite’s Hair" as a reference point.

Some things were not explained in the presentation, so I gave my best guess. E.g. the aero grid update step takes wind and colliders as input. But does it do fluid simulation for nice turbulence and vortexes? Possible, but not likely. I just mark 3 regions: lull (inside the mesh), half-strength (grid point is behind a collider), and full strength.

Ofc. I cannot rival Frostbite's performance. I am a single person and I have much better things to do than pore over a side project. I get stable <10ms (total frame time) on RTX3060 in the default view. It's enough so that swinging the ball through the hair is FUN! There is a button to hide the ball, which is EVEN MORE FUN!!!

Usage

Running the app locally

WebGPU does not work on Firefox. On Chrome, it requires HTTPS even during development.

  1. openssl req -newkey rsa:2048 -new -nodes -x509 -days 3650 -keyout key.pem -out cert.pem. Move both files to ./static.
  2. yarn install.
  3. yarn dev. Start a dev server that watches and compiles the code into ./build. It also copies stuff from ./static.
  4. yarn serve in a separate terminal. Starts http-server. It's not included in package.json, but let's be honest - you already have it installed globally.

Or yarn build for prod build.

Running the app in Deno

Node.js does not support WebGPU. Deno does (since version 1.39, December 14, 2023 - 8 months ago). Internally, it uses Firefox's wgpu.

  1. Download the .zip file from deno/releases.
  2. "<path-to-unzipped-deno>/deno.exe" cache "src/index.deno.ts". Download the dependencies.
  3. Run unit tests:
    1. "<path-to-unzipped-deno>/deno.exe" task test.
  4. Render to ./output.png:
    1. "<path-to-unzipped-deno>/deno.exe" task start.

Personally, I just use the makefile. Update paths there and you should be good to go.

For generating SDF and Blender hair export see makefile. They both call separate scripts. For SDF it loads the OBJ file that I've included in this repo. The Blender file is not included (file size!), but it's similar as in my Unity hair repo.

FAQ

How does this work?

See documentation for each pass in src/passes/README.md.

How many hacks are there in the codebase?

Not really? There are always bugs. I'm a solo dev, so there probably is some incorrectness. The parameters have to be fine-tuned (as one could expect). Performance can be improved, but it's a matter of raw man-hours and experiments. Surely there has to be something obvious missing, but..

Instead, let's list some more interesting aspects:

How does the software rasterization differ from Nanite's?

In the last few years, we have seen more rendering systems that lean on software rasterization. Two of the most known examples are UE5's Nanite and Frostbite's hair system. My previous project was Nanite WebGPU, which is a Nanite implementation that works in a browser. It has a meshlet hierarchy, software rasterizer, impostor billboards, etc. We can compare how software rasterization is used in both projects.

Anything about performance optimization?

WebGPU does not offer access to profilers. Or debuggers. With custom shader language (WGSL), writing isolated tests is painful. It's designed with no visibility in mind. The performance tests were done manually. By commenting out code, testing alternatives, etc. Here are a few things I've seen.

In two days after merging the physics branch into master, I was able to bring the HairFinePass from 30ms to 10ms. Mostly done by fixing a bug, which allowed for a much more aggressive config. E.g. half the tile size to 8px, smaller memory allocations, etc. I suspect there are a few low-hanging fruits still left. I just wish I had a profiler to know what is actually going on.

(EDIT 27.09) I've rewritten HairFinePass to parallelize over pixels in a tile. Now it takes ~3.3ms instead of 10ms. Still above the 144Hz VSync breakpoint on RTX3060, but it's a solid improvement.

Honourable mentions