curv3d / curv

a language for making art using mathematics
Apache License 2.0
1.14k stars 73 forks source link

Rendering in preview window does not match distance function #61

Closed p-e-w closed 5 years ago

p-e-w commented 5 years ago

As mentioned in #48, I am working on a polyhedron library for Curv. The current state of that library is now available at https://github.com/p-e-w/curv-solids.

The main feature is a distance function for arbitrary polyhedra, including nonconvex ones, defined by a list of vertices and a list of faces, each of which is a list of vertex indices.

I have reason to believe that this distance function is correct at least in the case of the cube. Indeed, the following code shows that it gives the same value as the standard library cube when evaluated at a large number of random points:

let
solids = file "solids.curv";
// From the "smoke.curv" example
random xy = frac(sin(dot(xy, (12.9898,78.233)))*43758.5453123);
random_point i = map (j -> 10 * (random(i,j) - 0.5)) [X,Y,Z];
in
map (
i ->
  let
  [x,y,z] = random_point i;
  in
  solids.cube.dist(x,y,z,0) - cube.dist(x,y,z,0)
) (1..1000)

However, the preview window shows not a cube but this:

bug

Even stranger, the shape actually changes when rotating it in the preview window.

The arcs radiating outwards appear to be artefacts from sphere tracing, and the fact that they converge at the vertices might indicate a problem with precision or numerical stability. But the crucial parts of the algorithm (point-in-polygon and computation of normals) are numerically stable.

At this point I'm afraid I am out of my depth. The above appears to imply that the shaders generated by the Geometry Compiler somehow behave differently than the function as evaluated by the interpreter. I have no idea how to debug that.

See #51 for system information.

doug-moen commented 5 years ago

Thanks for the bug report. There is definitely a problem, and I will try to track it down.

The fact that the shape changes when rotating it means that the distance field is invalid. The sphere tracing algorithm used for rendering shapes places constraints on the distance function. I won't go into technical details here; suffice to say that this doesn't tell us anything new; we already knew that something is wrong just by looking at the static render that you included.

doug-moen commented 5 years ago

I don't know what is going on yet. However, here's an update on my investigation.

solids.cube does not render at all on my nvidia GTX 1050 running on Ubuntu Linux. The viewer window just becomes non-responsive. The behaviour is consistent with the GPU driver going into an infinite loop while attempting to execute the generated GLSL code. Since it renders something with your Intel open source GPU driver, at 8 FPS, I guess I am seeing a bug or limitation in my GPU driver. (I'm using the nvidia proprietary driver, not the open source Nouveau driver.)

So my strategy will be to change the generated GLSL code until my GPU driver no longer goes into an infinite loop. That will be a good first step.

The last time I saw this behaviour, it was an old technology 2010 GPU that could not execute a GLSL while loop (it went into an infinite loop). This appears to be a known problem with older GPU technology, given that Google Chrome gives you an error if you try to use a while loop in WebGL 1.

The generated GLSL code (that runs on the GPU) contains a while loop. Even though GLSL supports a while loop, older GPU drivers get confused. They prefer loops with a bounded number of iterations, of the form for (i = lo; i < hi; ++i) where ideally lo and hi are compile time constants. The curv for statement generates this preferred style of loop.

The solids library contains a number of while loops, and they mostly have the following form:

var i := 0;
while (i < total) (
   ...;
   i := i + 1;
);

I left the while loops in triangulate_polygon alone, but I converted the rest to have the form:

for (i in 0 ..< total) (
    ...;
);

This makes the Curv source code shorter, and the generated GLSL code now contains a for loop instead of a while loop.

Unfortunately, that doesn't fix my problem, but I eliminated one of the potential triggers for the driver bug.

p-e-w commented 5 years ago

I have added a wireframe operation whose purpose is to render an edge-skeleton of a polyhedron. Its distance function is much simpler than that of polyhedron, but there are still massive rendering problems:

let
solids = file "solids.curv";
in
solids.cube >> solids.wireframe 0.2

bug

Interestingly, there appear to be not only obvious geometry issues, but shading issues as well.

Perhaps this can serve you as a better start for debugging the GLSL program, since I expect it would be much shorter than that of solids.cube.

doug-moen commented 5 years ago

I'm testing solids.cube >> solids.wireframe 0.2 today. On my system (GTX 1050), there are no rendering problems. However, the geometry is wrong. I'm getting two capsules (ie, cylinders with rounded end caps) that look like cube edges, just like in your image, plus a 3rd capsule that is infinitely long and obviously wrong (which is different from your image). (I am also testing with my optimizing GPU compiler, but that is not supposed to change what the rendered output looks like.)

I've implemented a framework to help debug the GPU compiler. I can now live edit the GLSL generated by the compiler from within Curv and view the results. So I should be able to make more progress now.

doug-moen commented 5 years ago

I figure the difference in rendering behaviour between your system and mine indicates that the Geometry Compiler is generating bad code that produces undefined behaviour. It could just be indexing off the end of an array. But I don't have a way of running a debugger on a GPU, or trapping out-of-bounds array indexes on a GPU.

What I do have is a new unit testing framework that lets me write Curv programs that test the geometry compiler. I can now write a comprehensive set of tests against the new array feature, compile the tests using the Geometry Compiler, execute the tests and fix any test failures.

thehans commented 5 years ago

@doug-moen there are a few GPU debuggers out there, have you tried them? https://gpuopen.com/compute-product/codexl/ https://developer.nvidia.com/linux-graphics-debugger

I've briefly played with nvidia's linux debugger a few years ago and seemed pretty useful.

doug-moen commented 5 years ago

@thehans I will check out those GPU debuggers. They should be very useful for profiling and performance tuning, which I plan after the optimizing compiler is further along.

The new unit test framework will emulate GLSL, more or less, on a CPU, so I can run these unit tests on a CI server that lacks a GPU. And that's also a good thing.

doug-moen commented 5 years ago

The wireframe now works. (Fixed a bug in array indexing.)

solids.cube still doesn't work (causes GPU infinite loop on my system)

p-e-w commented 5 years ago

I can confirm that the wireframe example now renders exactly as expected, and with high performance :+1:

On my system, solids.cube is also much improved, and now actually displays a cube for the first time:

bug

Note, however, that there are single-pixel artefacts everywhere, which near the perspective horizon seem to accumulate along field lines. I find it curious that anything can be rendered in those positions; after all, the bounding box (which is identical to the shape in this case) should already tell the renderer that those pixels, which lie completely outside the projection of the bounding box, can only be background, without having to do any ray tracing to determine that.

Other than that, the elephant in the room is that I'm getting only a lousy 6 FPS on a rendering task that even my weak GPU should be eating for breakfast. But rendering correctness comes first, of course.

doug-moen commented 5 years ago

The ray marcher doesn't pay any attention to the bounding box. The bounding box is currently used: to compute the initial camera position in the Viewer window, and to place bounds on the object during mesh export. At some point, I'm going to consider ways to speed up the ray marcher, and if using the bounding box makes it run faster, then I'll do that, but it will be a performance based decision.

Your distance field is bad. It's hard for me to investigate since I can't render your model due to what looks like a GPU driver bug, but I will sort that out.

Your distance function is slow because it contains nested for loops. The distance function is evaluated multiple times processing a single ray during ray marching. At every step, it is testing the ray against every face in the polyhedron.

To make this faster, you need a more efficient representation for polyhedra, something that results in a fast distance function. I have several ideas about this.

My earliest thoughts on the problem were to use a Nef polyhedron representation. A convex polyhedron is the intersection of a set of half-spaces. So you compute the distance of (x,y,z) to each half-space, which is a fast computation, and take the maximum of this set of distance values. If you add an optimizer that examines the set of half-spaces looking for symmetries, then you could end up with code that runs as fast as the native implementations of cube, dodecahedron, etc. Non-convex polyhedra are more complicated. You still have a set of half-spaces, but they must be combined using 2 different set operations, such as intersection and difference.

There is a project on github that efficiently renders scenes containing many polyhedra (defined by mesh files), using signed distance fields (https://github.com/xx3000/mTec). The mesh files are converted offline to voxel files containing distance values, and then these voxel files are loaded into GPU memory and rendered. It seems to be a popular solution to this problem, since GPUs contain specialized hardware for dealing with voxel grids. One problem is that you don't get sharp edges or corners: you can increase the resolution of the voxel grid to get sharper edges, but this blows up your memory requirements. I'm planning to add support for this technique later in the year.

There is a masters thesis, can't find the reference right now, that constructs a fast distance function for an STL file by putting all of the polyhedron faces into a bounding volume hierarchy (a tree). The time required to intersect a ray with a face depends on the depth of the tree, you are no longer linearly examining every face. For best efficiency, the ray marching algorithm is modified to work directly on the bounding volume hierarchy. This is something I want to investigate to speed up rendering in Curv.

You would use different polyhedron representations for different purposes. If you have a mesh containing millions of triangles, representing something like Michelangelo's sculpure of David, then converting the mesh to a voxel grid is probably a good idea.

If you are working with highly symmetrical polyhedra containing a small number of faces, like the polyhedra generated by Conway's polyhedron notation, then you can get performance by encoding all of the symmetries in the polyhedron representation. For example, the Curv prism distance function runs in constant time, no matter how many sides the prism has, because it encodes the rotational symmetry of the prism in its representation. The trick is to generalize this idea. Consider this: https://syntopia.github.io/Polytopia/polytopes.html

p-e-w commented 5 years ago

I did look into Nef polyhedra a while ago; however, not every polyhedron is a Nef polyhedron, and I wanted a distance function for arbitrary polyhedra first before specializing. Also, I believe that the vertices/faces representation the library currently uses is the most intuitive and "mathematical" one and thus should be the one offered by the API, and transforming this representation into a boolean operation tree seems like a highly nontrivial task.

My current plan is to do something much less ambitious, but which should still give a large speedup in practice: Detect, in the polyhedron constructor, whether the polyhedron is convex, and if it is, create it as an intersection of the half-spaces formed by its faces – the simplest type of Nef operation. The advantage is that convexity is easy to detect from the generic representation. If the polyhedron is nonconvex, the constructor just falls back to the current distance function.

Using the bounding box to speed up ray marching sounds like a worthwhile idea indeed. Detecting whether a given ray intersects the cuboid bounding box is trivial, and should be much faster than marching over even the simplest distance field. In practice, this would mean that all pixels that lie outside of the projection of the bounding box would be rendered essentially for free. In my above screenshot, that is roughly half of the window area, so implementing this might double the frame rate.

On a side note, why does the bounding box need to be explicitly passed to make_shape in the first place? Isn't it easy to compute at least an approximate bounding box from the distance function?

doug-moen commented 5 years ago

@p-e-w "Why does the bounding box need to be explicitly passed to make_shape in the first place? Isn't it easy to compute at least an approximate bounding box from the distance function?"

It's not easy, AFAIK. The only method I know involves interval arithmetic, which is not easy, and I suspect it won't work for every distance function. Infinite shapes may be a problem, and for some distance functions, I suspect that an automatically derived approximate bounding box will be so large compared to the exact bounding box as to be useless. Which means we will still need a way for developers to override the automatically generated bounding box.

If it does turn out to be easy, then derive_bbox(shape) could be written in Curv, and invoked from make_shape to define bbox if it isn't provided. Contributions are welcome.

doug-moen commented 5 years ago

@p-e-w said "I did look into Nef polyhedra a while ago; however, not every polyhedron is a Nef polyhedron, and I wanted a distance function for arbitrary polyhedra first before specializing."

A Nef polyhedron can represent any polyhedron with a finite number of faces, whether finite or infinite. The restriction is that these polyhedra are solid objects, with an inside and an outside. You can apply boolean operations like union and intersection to Nef polyhedra, and the result is always another Nef polyhedron.

If a set of arguments to your polyhedron function can't be converted to a Nef polyhedron, then it doesn't have a well defined inside and outside, and it doesn't have a distance function.

So in what sense is "not every polyhedron is a Nef polyhedron"? Many geometers define a polyhedron as a mesh of polygonal faces embedded in 3D, and they allow strange things like colinear faces, intersecting faces, and overlapping faces. A polyhedron is defined by its surface, and not by the volume it encloses. Some polyhedra are "non-orientable" (one sided), and don't enclose a volume.

For example, the Tetrahemihexahedron has intersecting faces, and is "non-orientable". It is not a solid object, and it lacks the property that each face has a side facing the interior, and a side facing the exterior. You could convert this to a solid object by removing the self intersections, by subdividing the square faces and adding additional vertexes and edges. The result is no longer a Tetrahemihexahedron, but now you can define a distance function, and now you can convert it to Nef representation. Removing self intersections is complicated, so an alternative is to restrict the polyhedron function to "well formed" polyhedra that have a distance function: they are 2-manifold, watertight, and non-self-intersecting. Which means no Tetrahemihexahedra.

p-e-w commented 5 years ago

@doug-moen FWIW, I believe that for a convex, bounded shape with an exact distance function d, the bounding box can be derived from d using something like the following algorithm:

  1. a = 1
  2. a = 2a
  3. E = axis-aligned plane of distance a from the origin
  4. Minimize d on E using a gradient descent algorithm, with the special property that if a point p with d(p) < 0 is encountered, we go to step 2. This descent should work in almost every case, because:
    • The shape is convex, so each point outside of it is minimally distant from exactly one surface point. Therefore, differentiability is not a concern outside of the shape.
    • If any iteration reaches the inside of the shape (d(p) < 0) the process stops. Therefore, differentiability is not a concern inside of the shape either.
    • Furthermore, I believe that the gradient of the distance function of a convex shape should be Lipschitz continuous except in a small region surrounding the nondifferentiable regions of the surface manifold. If true, this, together with the convexity of the distance function itself, would guarantee convergence to the minimum from almost all starting conditions.
  5. We now have a plane E that does not intersect the shape and is exactly d_min away from its surface. Determine which side of E the shape lies on, then translate E by d_min into that direction. This plane is one side of the bounding box.
  6. Apply the above steps mutatis mutandis to obtain the remaining sides of the bounding box.

This would seem to work for more than half of the shapes in the standard library at least.