[Book 2] BVH leaf optimisation (object_span == 2 slower)

johannmeyer commented 3 years ago

In the BVH code, the leaf nodes aren't placed in an AABB containing just one object when object_span==2. For the case of ray-sphere intersection, not placing the leaf nodes in AABBs improves performance slightly but, for more complicated ray-intersection tests, the AABB rejection can be very useful.

*Note: In the figures below, BVH nodes perform the hit test with AABB and objects perform arbitrary hit tests. Before:

After:

Proposal to change this section of code in BVHNode constructor:

if (object_span == 1) {
    left = objects[start];
    right = dummy;
    left->bounding_box(box);
    return;
}
// else if (object_span == 2) {
//     if (comparator(objects[start], objects[start+1])) {
//         left = objects[start];
//         right = objects[start+1];
//     } else {
//         left = objects[start+1];
//         right = objects[start];
//     }
// }

Add this class to Hittable.h (Null object pattern)

class NullHittable : public Hittable
{
    public:
        virtual bool hit(const Ray& /*ray*/, double /*min_ray_length*/, double /*max_ray_length*/, HitRecord& /*hit_rec*/) const override {return false;}

        virtual void bounding_box(AABB& output_box) const override {output_box = AABB();}
};

I make the dummy a static const member of the BVH Node class to make sure only one dummy exists and cannot be changed. static const std::shared_ptr<Hittable> dummy; and then in the BVHNode.cpp I add to initialise it:

const std::shared_ptr<Hittable> BVHNode::dummy = std::make_shared<NullHittable>();

This solution is cleaner and faster than performing the bounding_box collision test in, for example the Sphere::hit function, as each Hittable::hit child would be responsible to perform its own coarse collision first (error prone and code duplication).

Secondly, the left->hit right->hit code can be left in tact with no conditionals and the dummy just returns false instead of performing the collision test on the same object twice, which depending on the complexity of the object can be slow. I agree with @trevordblack (https://github.com/RayTracing/raytracing.github.io/issues/804) that branch code should be avoided but not that this is a non-issue, from my tests with optimisations turned on -O3 and a pseudo complex hit test. Return value caching would add more complexity than this approach and may lead to bugs and would be slower.

Additionally, is the sorting even necessary in the case (object_span == 2) since right->hit is always called after left->hit.

If you would like, I could create a pull request for this according to book's code style? My current implementation differs from that in the book and would need to be modified to fit the book's style.

hollasch commented 3 years ago

Been a while since I've been in the BVH code, but your comment about ordering the left and right is a good one, particularly since the axis is chosen randomly (int axis = random_int(0,2)). Certainly worth making the changes you suggest here and seeing if there are problems. Nothing's jumping out at me right now.

On the null hittable, I think it would be better to construct a single public static member in the empty_hittable class, and then every use should simply grab a reference to that single instance. Like empty_hittable::instance. The advantages are that (1) everything that uses it should just reference a single canonical instance, and (2) it's more an aspect of hittable objects than it is of BVH nodes.

Ironically, three days ago I would have suggested a non-abstract hittable class with default implementations to effectively be an empty hittable, and then let other classes inherit from that. However, I've recently read that the idea that classes should either be abstract or should be concrete and final. That is, they should either be something you only inherit from, or something that is complete in itself that nothing else inherits from. I have to confess that I'm still trying to get my head around that, but this is exactly the situation that many are warning of. Old dog, new tricks.

For now, I'd like to see some concrete timing results of such a change, and I keep a running log of performance as the code evolves. Please create an experimental branch off dev-major to get the latest version, and let us know when you're ready for us to play with hit (typo, but I have to leave it in).

hollasch commented 3 years ago

Contributor invite sent.

johannmeyer commented 3 years ago

I initially started with a non-abstract hittable class but decided to place it into a separate class as it clarifies the intent of the class and the design pattern being used. Also default implementations can be problematic when someone extends the abstract class but forgets to implement the required functions and the default implementation is not desirable in the general case.

My initial line of thought was (1) but upon writing this comment I thought about (2).

1) Note the empty_hittable class will provide benefits compared to the current implementation (which I can show) but removal of the (object_span == 2) clause may or may not provide benefits for spheres (marginally slower intersection test) and cubes (same intersection test so this would be duplication). If you implemented a more complex hit test, it may be beneficial. I am not so familiar with the different intersection tests that someone may want to implement and can't think of a hit test that would be more complicated since a triangle-based mesh of a complex object would itself be stored in a BVH and perform its own AABB test.

2) Looking at my figures again, instead of introducing the empty_hittable, you could remove the (object_span == 1) and leave the (object_span == 2) case and add the (object_span == 3) case. This would remove the AABB intersection test above object 3 in the before figure. This would actually be the preferred approach if a more complex hit test can't be formulated.

Edit: (object_span == 1) case might be necessary in case you create a BVH with a single object. Although this case doesn't make much sense from a practical stand point.

johannmeyer commented 3 years ago

If the hittable_list has 1 or 0 objects (aka object_span 1 or 0) it will fail. The current code goes into an infinite recursive loop if the hittable_list is empty.

A static factory constructor could solve it as it enables the check to only performed once instead of for each recursive creation of the bvh_node. If you want to enable the case of adding one object, it would also be possible to return scene.objects[0]

#include <stdexcept>

static std::shared_ptr<hittable> bvh_node::create(const hittable_list& scene)
{
    if (scene.objects.size() <= 1)
        throw std::invalid_argument("bvh_node::create: hittable_list has 1 or fewer objects.");

    return std::make_shared<bvh_node>(world);
}

It will now return an error at runtime instead of infinite recursion.

hittable_list scene;
hittable_list empty_list;
scene.add(bvh_node::create(empty_list)); // will throw error above

johannmeyer commented 3 years ago

The test branch By only applying the change below and including optimisations (-O3 -ffast-math) in the compiler settings, it took off 10 seconds from the execution time of theNextWeek (1m22s -> 1m12s). I tested it using time ./build/theNextWeek > output.ppm. Without any optimisations and the original code, it takes 7m51s.

-march=native improves it by a further two seconds but makes the code less portable. Note all these optimisation settings are for gcc. See MSVC optimisations as I am unfamiliar with them.

if (object_span == 3) {
    left = std::make_shared<bvh_node>(objects, start, start+2);
    right = objects[start+2];
} else if (object_span == 2) {
    left = objects[start];
    right = objects[start+1];
} else {
    std::sort(objects.begin() + start, objects.begin() + end, comparator);

    auto mid = start + object_span/2;
    left = make_shared<bvh_node>(objects, start, mid);
    right = make_shared<bvh_node>(objects, mid, end);
}

Let me know if you would like me to add the safe guard against the empty or single element hittable_list. These changes would require changing all the examples that use the BVH to use the proposed bvh_node::create function.

trevordblack commented 3 years ago

I need to read through this. Seems like you've discovered an excellent little improvement.

I'll circle back around once I get some time.

hollasch commented 2 years ago

Just realized (or more likely, forgot) that Johann's code is on the bvh-optimisation-890 branch. I rebased it (without conflict) to origin/bvh-optimisation-890-B.

trevordblack commented 2 years ago

Okay, quick thoughts,

We're not going to add optimization flags to the build. That is outside scope. We have compiler flags to disable warnings so that a user who is brand new to c++ won't be scared off by a mile long wall of warnings. But optimization flags is too far. I don't feel comfortable adding that to the code without explaining why it's there, explaining it's utility, having a solution for every compiler & OS (which, yes, would be a nightmare).
This doesn't work for the case where objects only contains 1 entry. As far as I can tell it will produce infinite recursion. Not sure how I feel about that.
I appreciate the removal of the NullHittable, I don't want to add an abstract class unless it's strictly necessary.
My interpretation of this text:

When the list coming in is two elements, I put one in each subtree and end the recursion. The traversal algorithm should be smooth and not have to check for null pointers, so if I just have one element I duplicate it in each subtree. Checking explicitly for three elements and just following one recursion would probably help a little, but I figure the whole method will get optimized later.

Is that the source is deliberately a bit dodgy, with the understanding that the user could optimize the code later (potentially using some sort of check for 3 elements). Which is what you've done.

This is one of those examples where I think a naive (dumb) solution is the right teaching tool, and attempts to optimize should not create more stuff to learn.

Lastly, splitting the elements based on their index in the array is an objectively mediocre means of splitting a bvh. The one-step up solution is to sort the bvhs based on the surface area, which is a reasonably good proxy for percentage change that a ray will intersect.

How much of a speed-up do you see without the O3 optimization?

hollasch commented 2 months ago

I just ran a test with and without the null hittable object for the book 2 final scene.

Without the null hittable: 1:19.083 With the null hittable: 1:19.008

So I like the idea, but it really doesn't seem to offer any significant speedup.

RayTracing / raytracing.github.io

[Book 2] BVH leaf optimisation (object_span == 2 slower) #890