InOneWeekend: clarify Defocus Blur

shaunplee commented 4 years ago

I'm having trouble developing an intuition on how defocus blur works.

In trying to understand the chapter, I drew, for myself, a slightly modified version of Figure 17 to add an "out-of-focus plane":

Limnu_2020_4_11

    ray get_ray(double s, double t) {
        vec3 rd = lens_radius * random_in_unit_disk();
        vec3 offset = u * rd.x() + v * rd.y();

        return ray(
            origin + offset,
            lower_left_corner + s*horizontal + t*vertical - origin - offset
        );

My understanding is that the modification of get_ray from Listing 63 to be a ray to lower_left_corner + s*horizontal + t*vertical - origin - offset causes the rays from origin + offset to converge at the same point because the - offset translates the focus plane to compensate for the random offset at the lens, but this compensation is only perfect at the focus plane. Based on the shape of the rays, at other distances, color samples are taken from different parts of the scene, and the rays are farther apart the farther you are from the focus plane. A smaller aperture results in a tighter set of rays and therefore less blur.

However, I'm not sure if my understanding is correct.

My suggestion is to add some additional explanation of what is happening at the focus plane.

I also note that the aperture is specified in the more natural units of a diameter, and not as an f-stop. I experimented with setting the aperture to 16.0, unthinkingly assuming it would make everything sharp and got an extremely blurry image. I probably made the assumption because Listing 64 suggests "Using a big aperture" and sets the aperture to "2.0" and, as it happens, f/2.0 is a pretty large aperture. I don't know if other readers will make the same assumption.

dafhi commented 4 years ago

A smaller aperture results in a tighter set of rays and therefore less blur.

looks like it contradicts your last paragraph

shaunplee commented 4 years ago

It's very likely I'm being self-contradictory, as I'm pretty confused about this whole thing. :)

A smaller aperture resulting in less blur is in line with my experience with using a camera, in that higher f-stops give me less blur (more depth of field).

The last paragraph from my original post seems to me to be consistent with that. 2.0 is a "large" aperture. Listing 65 for creating the "final image" uses a more reasonable aperture of 0.1, which still causes the defocus blur effect to be apparent in the image.

So I don't quite see the contradiction yet. Will you please help?

(The last paragraph was really a throwaway aside. I had thought about opening a separate issue to ask about changing the units of the aperture to an f-stop, but realized that doing so would also need to take into account the focal length (expressed as a field of view fov in the code, which might just make it even more confusing), which seemed like too much of a headache for relatively little gain.)

dafhi commented 4 years ago

Vertical Field of View is trig

distance between lens center and screen
height of screen

shaunplee commented 4 years ago

Yes, I'm with you so far.

dafhi commented 4 years ago

focal length (expressed as a field of view

did you mean focal length (expressed as focus_dist

dafhi commented 4 years ago

I see something in Book 1 figure 17 that looks incorrect. Virtual film plane, in my understanding is the screen, not the focus plane

shaunplee commented 4 years ago

focal length (expressed as a field of view

did you mean focal length (expressed as focus_dist

No, I'm thinking of focal length and focus distance as different things.

From the above-linked Wikipedia article:

In most photography and all telescopy, where the subject is essentially infinitely far away, longer focal length (lower optical power) leads to higher magnification and a narrower angle of view; conversely, shorter focal length or higher optical power is associated with lower magnification and a wider angle of view.

I agree that Figure 17 from the book is confusing. My reading of the sentence just before Figure 17:

Instead, I usually start rays from the surface of the lens, and send them toward a virtual film plane, by finding the projection of the film on the plane that is in focus (at the distance focus_dist).

is that we are projecting (translating and scaling) the virtual film to match up with the focus plane.

This confusion is my reason for opening this issue. How does Figure 17 match up with the code in Listing 63?

dafhi commented 4 years ago

So we're clear, focal length is not the same as vfov.

I think Peter Shirley signs off on new Figures. A new figure, imo, would do well to include the screen.

shaunplee commented 4 years ago

I agree that focal length is not the same as vfov.

Some subset of people who are also camera geeks (like me), carry baggage from other fields. In that field, "focal length" is used as shorthand for "field of view," usually with reference to 35 mm film cameras. In the context of digital photography, a 35 mm digital sensor is referred to as a "full frame" sensor. As such, a fixed or "prime" camera lens with a focal length of 50 mm has a wider field of view than a camera lens with a focal length of 200 mm. While these lenses might have fixed focal lengths, they are able to adjust their focal distances so that you can focus your lens on your subject.

Putting the same lens in front of a different sensor will result in a different field of view. For example, many digital cameras have digital sensor that are smaller than 35 mm, resulting in a "crop" of the image presented by the lens. For example, the same 50 mm lens on a small sensor would have a narrower field of view than the same lens in front of a larger sensor.

Referring to the same Wikipedia article:

Focal length (f) and field of view (FOV) of a lens are inversely proportional. For a standard rectilinear lens, $FOV = 2 arctan x/(2f)$, where x is the diagonal of the film.

In summary, I completely agree that focal length is not the same is vfov. Informally, among some narrow subset of readers, focal length (not focal distance) is indicative of fov.

dafhi commented 4 years ago

My opinion is overall, nothing here is rocket science. I would suggest to follow the code if it works and move on

shaunplee commented 4 years ago

Agreed. I'm just trying to point out parts where I've been confused in an attempt to improve the book for the next reader.

If you celebrate, Happy Easter!

hollasch commented 4 years ago

Happy Easter Shaun!

You're correct that focal distance and the "focal plane" are different constructs. A lot of the difference comes from the fact that camera terminology mixes absolute and relative measurements as well. f-stop is a measurement relative to the focal point of the lens. It expresses the size of the lens aperture relative to the focal distance, sort of like radians express the distance around a circle of any diameter.

Additionally, the ray-tracing optical setup is different from a camera's in that the focal point in ~a camera~ computer graphics is typically behind the image plane, while in a camera it's in front of the image plane. To add to the confusion, almost every graphics text I've seen indicates that the focal point is the "eye" position (or "camera" position), which is a horrible idea that spawns boatloads of misunderstanding.

It's first easiest to understand the function of the lens with respect to photography. Consider a pinhole camera setup. For a given point in the world, light hits it from many angles, and is reflected in many different directions, depending on the characteristics of the material. A pinhole camera essentially filters out almost all of those reflected light rays, allowing only the rays that go from the surface point through the pinhole, where they then hit the image plane (film/sensor). In the ideal case, with an infinitely small pinhole, you get an infinitely sharp mapping of the world onto the film plane. That's awesome, except that the smaller the pinhole, the less light actually makes it onto the film, and you thus need an exposure that takes up to weeks, depending on the film sensitivity.

The lens expands the aperture from a pinhole to much larger aperture. Rays that come from the surface and which hit the optical plane (lens) off center are optically bent back to the same point on the film plane. In this way, a camera with a lens gather much more light in the same time. A "fast" lens maximizes the area that captures and bends light coming from a given point. Also note that you could consider the result a specularly "blurred" sample, in that it integrates the light from a single point across the cone of possible angles. A mirrored surface will thus have a blurred reflection.

This is all great, but now consider the geometry of this setup. Light comes from some point S, widens into a cone with base A, where A can be modeled as an idealized flat circle on the lens plane. This then coupled with a flipped cone with the same base, and a point P on the image plane. The optics work out such that all points on the plane parallel to the lens (and image) plane will send out rays that are then bent to a single unique point on the image plane. Thus, all points on this plane are in perfect focus. This plane is just a geometric construct.

Now for a given theoretical point P on the focal plane, imagine that there's nothing there. Extend the rays backwards until they do hit something. Each one of these rays will hit a different point in the world, and the resulting point on the image plane will thus contain the integral of the light from all these different points. Thus, the image rays will blur the contributions from all these different points. If half of these points hit a black material and half hit a white material, the image at the projected point will be gray.

One thing that's odd in the raytracing setup is that, because the focal point is behind the image plane, it really models a setup where every point on the image plane gets its own virtual "lens". In this way it captures "stray" rays from surrounding image point and then combines them for that point. You can't really do this in the physical world, but we can do it in ours.

Now, you could create a raytracing setup that better approximates a real camera. Focal point in front, with a lens at that point, yielding an inverted image. You could even model a proper mechanical aperture, and a glass lens with non-zero thickness. Go crazy, and you can include chromatic aberration (a good deal more difficult with a proper continuous spectrum).

Does this help?

shaunplee commented 4 years ago

Happy Easter, Steve!

Thank you very much for taking the time to respond. Yes, your comment was very helpful.

Yes, I started down the path of thinking through what it would take to rewrite my code to use a f-stop instead of a diameter for the aperture and realized that doing so would have to introduce way too many confusing concepts and complications from camera terminology with little benefit.

Sorry for the distracting aside.

Regarding my main question, your paragraph:

Now for a given theoretical point P on the focal plane, imagine that there's nothing there. Extend the rays backwards until they do hit something. Each one of these rays will hit a different point in the world, and the resulting point on the image plane will thus contain the integral of the light from all these different points. Thus, the image rays will blur the contributions from all these different points. If half of these points hit a black material and half hit a white material, the image at the projected point will be gray.

seems to be consistent with what I was trying to express in the drawing in my original post, where rays that fail to hit something in the focal plane might hit very different points in the "out-of-focus plane." Likewise, if those rays hit something else on their way to point P, those would also be from different points in the world.

The - offset term in the expression lower_left_corner + s*horizontal + t*vertical - origin - offset for the endpoint of the ray seems to be the term that causes the rays cast from the lens plane A to converge at point P in the image plane, because it cancels out the + offset in the expression origin + offset for the startpoint of the ray.

(The above is my attempt at enthusiastic agreement with you.)

Part of the confusion might be that my mental model of the ray tracing camera is stuck in Figure 2: where the "film" is placed just in front of the pinhole "lens." In Figure 17 are we now moving that transforming that "film" to the focus plane? But how can that film "see" things between the film and the lens? Considering the plane defined in the camera code to be the focal plane P rather than the film plane S helps to clarify what's happening.

My opinion is that Chapter 12 would be improved with just a little bit more explanation. I wonder if some of the content from your message above could be incorporated into the book.

hollasch commented 4 years ago

Well, the reality is that until you deal with depth of field, the placement of the image plane doesn't really matter. The image is just a perspective projection off the three dimensional space onto the two-dimensional rectangle. For an orthographic projection parallel to the Z axis, just lop off the Z coordinate. You can think of the image plane as being the plane Z=something, but the something doesn't really matter, because all such projections will yield the same image regardless of the image plane.

In the same way with a perspective projection, all image planes perpendicular to the projection axis are identical. Geometrically they may be different sizes depending on the distance from the focal point, but as long as they're not at the focal point, they will differ only in their size in the 3D world. The resulting 2D images, however will be identical regardless of where they were placed on the focal axis — they're just the result of collapsing (projecting) the 3D world onto an image plane.

With respect to modeling depth of field, the model in the book does not emulate a standard camera + lens, because instead of sharing a common aperture, every pixel gets its own virtual lens. The figure you proposed lacks only the image plane and a representative pixel from which the model rays emanate. I can see how such a figure might make the model more clear.

With regard to your comment that you “started down the path of thinking through what it would take to rewrite my code to use a f-stop instead of a diameter for the aperture”, I'd recommend that you revisit that. It shouldn't be any harder than the model presented in the book, and would yield a lot of concepts that you might be more familiar with.

Such a model would include an image center point, an image half horizontal vector (pointing to the left), an image half vertical vector (pointing up) and the focal point. You should then define a mapping so that the image is sampled from [-1,-1] to [+1,+1]. [-1,-1] should give you the lower right corner of the image rectangle, and would map to [0,0] of the resulting raster image. [+1,+1] should give you the upper left corner of the image rectangle, and would map to [width,height] of the resulting raster image. Note that this mapping accounts for the fact that the image rectangle is upside-down and flipped left-for-right. For each pixel point on the image plane, the pinhole camera would be modeled by shooting a ray from the pixel to the focal point.

To model a lens with aperture, model a sampling circle (I'd use two vectors orthogonal to each other and to the focal axis). For a given pixel, the ray from the pixel through the focal point will intersect the focal plane at some point P. Your sample rays will all be randomly distributed in the aperture circle, and all will intersect the point P. Sample & integrate these rays for you image. I don't know where the focal point will lie relative to the aperture, but if you want to model a particular lens body, you can work that out. Also note that in a real camera, the aperture and the lens are not coincident as I've described above. typically the aperture is between the image plane and the lens(es), but this position doesn't really matter, as it's just a way to block the light at some point (just like it doesn't matter how far away your hand is from your pupils when you shade your eyes). Also note that the "lens" above is infinitely thin, unlike a true lens.

shaunplee commented 4 years ago

Thank you, Steve for continuing to engage with this discussion.

I'm with you that all image planes perpendicular to the projection axis ("camera axis?") would be identical. My original mental model based of Figure 2 was something like: I'm standing at (0,0,0) in front of a window that extends from (-2,-1,-1) to (2,1,-1) that is divided into a grid, and I paint each cell of that grid with the color that I see through each cell of the grid.

In some ways, Figure 16 already shows the arrangement of a first cone of rays from point S on the image plane through the lens area A and a second cone of rays to a common point P on the focus plane, but merely lacks the labels. Part of where I got lost was how Figure 17 related to Figure 16, apart from just covering up the left side of Figure 16.

Looking at the text, the only two sentences that seem to explain what is happening are:

In order to accomplish defocus blur, generate random scene rays originating from inside a disk centered at the lookfrom point. The larger the radius, the greater the defocus blur.

I feel like it could just use a little bit more explanation.

I'll try your suggested exercise of putting the image plane behind the lens. Thanks!

RayTracing / raytracing.github.io

InOneWeekend: clarify Defocus Blur #464