PTex faces vs Patches - Githubissues

fgiordana commented 10 years ago

Hi guys,

I'm finally trying to integrate OSD into our fur system, so I'm trying to understand what would be the best approach to solve some issues and I could use some help.

1) I need to grow follicles on the surface: each follicle will have a face ID and a uv parameter associated to it. I have to decide what faceID means, my first instinct would be to use the ID of a patch in the FAR mesh, but looking at the eval API perhaps it would be better to use PTex face index instead

2) I need to compute the area of such faces in order to compute the density. I couldn't find an easy way to access the vertices of the ptex faces (only patches and coarse faces), is there a trick you can suggest?

3) Projection/Intersection: similar problem as before, if I can access the PTex face vertices I already have my own acceleration structures for the ray casting.

4) Eval: If I choose to index directly the patches I'm assuming I will have to write my own eval controller, is that right? Or are there plans to extend the existing one to evaluate directly points on patches instead of going through the ptex faces?

Well there is a lot more, but these are my main blockers for now, I hope this makes sense.

As you can see I just need to understand what level to target (coarse faces / ptex faces / patches) and how to access the relative information then I should be good to go. Coarse faces are a no-go, since I need to make sure they are quads and parametrized.

Many thanks! Francesco

manuelk commented 10 years ago

Hi Francisco

perhaps it would be better to use PTex face index instead

That would be my recommendation: it is the simplest un-ambiguous way of identifying a given location on the limit surface. The added benefit is that you can also leverage all the Ptex coordinates machinery in your GPU shaders.

I need to compute the area of such faces in order to compute the density

This is indeed something that we have not done yet. The vertices for the ptex faces are very easy though - all you would need to do is "quadrangulate" the coarse mesh:

skip quads
non-quads add a vertex at the center and create the quad sub-faces in CCW order If you still have your Hbr mesh lying around, you can query all this from there, including ptex id and verts (all non-quads will have been refined)

are there plans to extend the existing one to evaluate directly points on patches instead of going through the ptex faces?

I didn't think anyone would want to access the adaptive patches directly as they are dependent on topology which is almost never representative of what the surface is doing. Some of the extensions i had in mind was to batch eval calls where you can pass grids of pre-set points and leverage parallel architectures (sorry - it's very far on my back burner...). Assuming you could get some kind of area metric on a ptex face, would you still need to access the adaptive patches directly ?

Coarse faces are a problem, because as you point out, non-quads have ambiguous parametrizations.
Ptex faces are un-ambiguous, but you usually need to maintain some kind of table to map coarse to ptex faces (or vice-versa).
Patches are a lot more complex as a ptex-face could be covered by dozens of sub-patches with unpredictable layouts (look at the FarPatchMap quadtree class for an idea of how Eval connects ptex faces to patches...).

fgiordana commented 10 years ago

Hi Manuel, Thanks a lot for the clarifications!

I will start playing around with PTex faces, but it is performance that worries me a bit. We will need to re-evaluate each frame millions of points on the surface (~10M for a hero creature) and having to remap each time the ptex face id with the appropriate patch for each point can be quite costly. I would gladly do that once at the start (when we bind the follicles to the surface) and then directly access the underlying patches. That will limit us to non-adaptive Catmark surfaces of course, but that's all we really need.

Anyhow, once I have it working with PTex faces I'll be able to get some profiling and see how much that patch finding and uv normalization/rotation really impacts on our performance.

Thanks again, I will keep you posted! Francesco

manuelk commented 10 years ago

I have run a few traces a while back and IIRC, the cost of traversing the FarPatchMap to locate patches was mostly negligible. This can be verified by running the limitEval example and scaling feature isolation from 1 to 9 with no significant drop in performance.

One question though: when you say "10M hair per frame" - are you thinking interactive hairs on GPU or simply posing them for off-line render ?

What i should have mentioned earlier though, is that if your follicle parametric loci are constant, you can amortize a lot of the interpolation costs up-front. I would suggest looking at the glStencilViewer example for implementation details. Table generation time is a factor of the number of samples, unlike EvalLimit which depends on mesh topology and isolation factor. However, execution time is substantially reduced because the limit evaluation is reduced to a an average of 16 multiply-adds of vertex data, with no conditional branching (this makes them very amenable to GPU implementations, and there is a good chance that we will be replacing our clunky subdivision tables with stencils in a not too distant future). Stencils were implemented specifically for problems like hair.

On my workstation, with 16 Xeon cores:

250k samples with EvalLimit: 60 ms
250k samples with EvalStencil: 8 ms

Food for thought...

fgiordana commented 10 years ago

That would be "interactive" hairs on GPU, which means as fast as we can get them to be. A huge bottleneck for us right now is the subdiv surface evaluation, so OSD should really give us a big boost. It sounds like the Stencil Eval is exactly what we need, thanks for the tip! Our parametric loci are indeed constant, since that is a prerogative of follicles, so we should be able to exploit that quite well

manuelk commented 10 years ago

For grooming, you might want to stick with LimitEval and bite the cost of the quadtree traversal.

Once the follicle loci are established, generating and caching stencils should be your best bet.

Couple of things you probably want to know:

The current code in the stencil factory that generates the stencil tables is far from optimal, but it does try to cache intermediate computations. Ideally you will want to pass the hairs grouped by face, and if you have many hairs, it may help to pre-sort them into (u,v) domain buckets.
We currently don't have a GPU implementation of stencils, we only have CPU, OMP and TBB contexts & controllers. A CL / CUDA implementation should be fairly trivial though... and it would be totally awesome if a talented developer were to contribute those to the project ;)

fgiordana commented 10 years ago

Ok that's good, since we do group follicles per-face.

I will familiarize with the TBB implementation first and then hopefully I'll be able to give the CUDA one a shot at some point :)

manuelk commented 10 years ago

Looks like you are good for now, so i will close the issue. If you run into more questions, please feel free to re-open or create a new one.

PixarAnimationStudios / OpenSubdiv

PTex faces vs Patches #277